会员注册 | 登录 | 微信快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

外文翻译---不完整测量数据的概念建构 英文版.pdf外文翻译---不完整测量数据的概念建构 英文版.pdf -- 5 元

宽屏显示 收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

unsuspectedrelationshipswhichareofinterestorvaluetothedatabasesowners,ordataminers9.Duetothelargenumberofdimensionalityandthehugevolumeofdata,traditionalstatisticalmethodshavetheirlimitationsindatamining.Tomeetthechallengeofdatamining,artificialintelligencebasedhuman–computerinteractivetechniqueshavebeenwidelyusedindatamining3,16.ConceptualconstructiononincompletesurveydataShouhongWanga,,HaiWangbaDepartmentofMarketing/BusinessInformationSystems,CharltonCollegeofBusiness,UniversityofMassachusettsDartmouth,285OldWestportRoad,NorthDartmouth,MA027472300,USAbDepartmentofComputerScience,UniversityofToronto,Toronto,ON,CanadaM5S3G4Received22March2003receivedinrevisedform9September2003accepted20October2003Availableonline26November2003AbstractTherawsurveydatafordataminingareoftenincomplete.Theissuesofmissingdatainknowledgediscoveryareoftenignoredindatamining.Thisarticlepresentstheconceptualfoundationsofdataminingwithincompletesurveydata,andproposesqueryprocessingforknowledgediscoveryandasetofqueryfunctionsfortheconceptualconstructioninsurveydatamining.Throughacase,thispaperdemonstratesthatconceptualconstructiononincompletedatacanbeaccomplishedbyusingartificialintelligencetoolssuchasselforganizingmaps.C2112003ElsevierB.V.Allrightsreserved.KeywordsIncompletesurveydataSurveydataminingConceptualconstructionSelforganizingmapsClusteranalysisKnowledgediscoveryQueryprocessing1.IntroductionDataminingistheprocessoftrawlingthroughdatainthehopeofidentifyinginterpretablepatterns.Dataminingisdifferentfromtraditionalstatisticalanalysisinthatitisaimedatfindingwww.elsevier.com/locate/datakDataKnowledgeEngineering492004311–323Correspondingauthor.Emailaddressesswangumassd.eduS.Wang,haics.toronto.eduH.Wang.0169023X/seefrontmatterC2112003ElsevierB.V.Allrightsreserved.doi10.1016/j.datak.2003.10.007aneffectivemethodindealingwithhighdimensionaldata6,12.Moreimportantly,theSOMmethodprovidesabaseforthevisibilityofclustersofhighdimensionaldata.Thisfeatureisnot312S.Wang,H.Wang/DataKnowledgeEngineering492004311–323availableinanyotherdataanalysismethods.Itallowsthedataminertoanalyzeclustersbasedontheproblemdomain.Surveyisoneofthecommondataacquisitionmethodsfordatamining4.Indatamining,onecanrarelyfindasurveydatasetthatcontainscompleteentriesofeachobservationforallofthevariables.Commonly,surveysandquestionnairesareoftenonlypartiallycompletedbyrespondents.Theextentofdamageofmissingdataisunknownwhenitisvirtuallyimpossibletoreturnthesurveyorquestionnairestothedatasourceforcompletion,butisoneofthemostimportantpartsofknowledgefordataminingtodiscover.Infact,missingdataisanimportantdebatableissueintheknowledgeengineeringfield15.Inminingasurveydatabasewithincompletedatathroughclusteranalysis,patternsofthemissingdataaswellasthepotentialimpactsofthesemissingdataontheminingresultsareknowledge.Forinstance,adatamineroftenwishestoknowhowreliableaclusteranalysisiswhenandwhycertaintypesofvaluesareoftenmissingwhatvariablesarecorrelatedintermsofhavingmissingvaluesatthesametime.Thesevaluablepiecesofknowledgecanbediscoveredonlyafterthemissingpartofthedatasetisfullyexplored.Thispaperdiscussestheissueofmissingdatainminingsurveydatabasesforknowledgediscovery,presentstheconceptualfoundationsofconceptualconstruction,andproposesasetofqueryfunctionsforconceptualconstructioninSOMbaseddatamining.Therestofthepaperisorganizedasfollows.Section2discussestheissuesofmissingdatarelatedtodatamining.Section3introducesSOMforconceptualconstructiononincompletedata.Section4suggestsfourconceptsasknowledgediscoveryindataminingwithincompletedata.ItprovidesaschemeofconceptualconstructiononincompletedatausingSOM.Section5proposesaquerytoolthatisusedtomanipulateSOMforconceptualconstruction.Section6presentsacasestudythatappliesthequerytooltomanipulatetheSOMfortheconceptualconstructiononastudentopinionsurveydataset.Finally,Section7offersconcludingremarks.2.IssuesofmissingdataIncompletedatasetsareubiquitousindatamining.Therehavebeenmanytreatmentsofmissingdata.Oneoftheconvenientsolutionstoincompletedataistoeliminatefromthedatasetthoserecordsthataremissingvalues.This,however,ignorespotentiallyusefulinformationinthoserecords.Incaseswheretheproportionofmissingdataislarge,theconclusionsdrawnfromthescreeneddatasetaremorelikelybiasedormisleading.Therehavebeenmanynonstatisticaltechniquesfordatamining.TheselforganizingmapsSOMmethodbasedonKohonenneuralnetwork12isoneofthepromisingtechniques.SOMbasedclustertechniqueshaveadvantagesoverothermethodsfordatamining.Dataminingtypicallydealswithveryhighdimensionaldata.Thatis,anobservationinthedatabasefordataminingistypicallydescribedbyalargenumberofvariables.Thecurseofdimensionalityturnsstatisticalcorrelationsofdatainsignificant,andthusmakesstatisticalmethodspowerless.TheSOMmethod,however,doesnotrelyonanyassumptionsofstatisticaltests,andisconsideredasS.Wang,H.Wang/DataKnowledgeEngineering492004311–323313Anothersimpleapproachofdealingwithmissingdataistousegenericunknownforallmissingdataitems.Indatamining,unspecifiedunknownforallmissingdataitemsoftencausesconfusionandmisinterpretation.Thethirdsolutiontodealingwithmissingdataistoestimatethemissingvalueinthedatafield.Inthecaseoftimeseriesdata,interpolationbasedontwoadjacentdatapointsthatareobservedispossible.Ingeneralcases,onemayusesomeexpectedvalueinthedatafieldbasedonstatisticalmeasures7.However,indatamining,surveydataarecommonlyofthetypesofranking,category,multiplechoices,andbinary.Interpolationanduseofanexpectedvalueforaparticularmissingdatavariableinthesecasesaregenerallyinadequate.Moreimportantly,research2indicatesthatameaningfultreatmentofmissingdatashallalwaysbeindependentoftheproblembeinginvestigated.Morerecently,therehavebeenmathematicalmethodsforfindingtheaggregateconceptualdirectionsofadatasetwithmissingdatae.g.,1,10.Thesemethodsmakethemselvesdistinctfromthetraditionalapproachesoftreatingmissingdatabyfocusingonthecollectiveeffectsofthemissingdatainsteadofindividualmissingvalues.Thissuperiorfeatureofthesemethodscanbebestbuiltupfordataminingonincompletedata.However,thesestatisticalmethodshavelimitations.First,itisassumedthatmissingvaluesoccurinarandomfashionorfollowacertaindistributionfunctions.Theirstrongassumptionsaboutthedistributionsofdataareofteninvalidespeciallyforcasesofsurveywithincompletedata.Second,thesemathematicalmodelsaredatadriven,insteadofproblemdomaindriven.Infact,asinglegenericconceptualconstructionalgorithmisinsufficienttohandleavarietyofgoalsofdataminingsinceagoalofdataminingisoftenrelatedtoitsspecificproblemdomain.Knowledgediscoveryindatabasesisthenontrivialprocessofidentifyingvalid,novel,potentiallyuseful,andultimatelyunderstandablepatternsofdata8.Followingthisdefinition,thisresearchemphasizestwoaspectsofconceptconstructionindataminingwithincompletedata.First,thecriteriaofvalidity,novelty,usefulnessoftheconceptstobeconstructedindataminingwithincompletedatacouldbeproblemdependent.Thatis,theinterestofadatapatterndependsonthedatamineranddoesnotsolelydependontheestimatedstatisticalstrengthofthepattern14.Second,theconceptualconstructionbasedontheincompletedataisaccomplishedthroughheuristicsearchincombinatorialspacesbuiltoncomputerandhumancognitivetheories13.Human–computercollaborationconceptconstructionistheinteractiveprocessbetweenthedataminerandcomputertoextractnovel,plausible,useful,relevant,andinterestingknowledgeassociatedwiththemissingdata.Inourview,dataminingdiffersfromtraditionalstatisticsindealingmissingdatainmanyways.1Dataminingattemptstoextractunsuspectedandpotentiallyusefulpatternsfromthedataforthedataminerswithnovelgoalsrelatedtothemissingdata,ratherthantoestimatetheindividualvaluesofthemissingdata.2Dataminingisahumancenteredprocessimplementedthroughknowledgediscoveryloopscoupledwithhuman–computerinteractiontoperceivetheimpactofthemissingdataatanaggregatelevel,ratherthanaonewaymathematicalderivationbasedonunverifiedassumptions.3.ToolforconceptualconstructionselforganizingmapsSOMGivenalargesetofhighdimensionalsurveysamples,thereusuallybeasignificantnumberofobservationshavemissingvalueshowever,notallmissingdataarerelevanttothedataminerC213sinterest.Hence,anysimplebruteforcesearchmethodformissingdataisnotonlyinfeasibleforahugeamountofdata,butalsohelplesswhenthedatamineristoidentifyproblems,ordevelopconcepts,throughdatamining.Toidentifyproblemsordevelopconcepts,thedataminerneedsatooltoobserveunsuspectedpatternsoftheavailabledataandthemissingparts.SelforganizingmapsSOM12havebeenwidelyusedforclustering,sinceSOMaremorecomputationallyefficientthanthepopularkmeansclusteringalgorithm.Moreimportantly,SOMprovidedatavisualizationforthedataminertoviewhighdimensionaldata11.Research14,16314S.Wang,H.Wang/DataKnowledgeEngineering492004311–323indicatesthatSOMareeffectiveindataminingfortheidentificationofunsuspectedpatternofthedata.Specifically,SOMcanbeusedforclusteranalysisonmultivariatesurveydata.ThisstudytakesonestepfurtherandusesSOMasatoolforconceptconstructionrelatedtomissingdata.Conceptualconstructiononincompletedataistoinvestigatethepatternsofthemissingdataaswellasthepotentialimpactsofthesemissingdataontheminingresultsbasedonlyonthecompletedata.Asseenlaterinourillustrativeexamples,SOMprovideamechanismforhuman–computercollaborationtoconstructconceptsfromthedatawithmissingvalues.SOMcanlearncertainusefulfeaturesfoundintheirinputpatternsthroughtheunsupervisedcompetitivelearningprocess,andmapthehighdimensionaldataontolowdimensionalpictures,allowingthedataminertoviewthemapwithclusters.TheneuralnetworkdepictedinFig.1isthetwolayerSOMusedinthisstudy.Thenodesatthelowerlayerinputnodesreceiveinputspresentedbythesampledatapoints.Thenodesattheupperlayeroutputnodeswillrepresenttheorganizationmapoftheinputpatternsaftertheunsupervisedlearningprocess.Everylowlayernodeisconnectedtoeveryupperlayernodeviaavariableconnectionweight.TheunsupervisedlearningprocessinSOMcanbebrieflydescribedasfollows.Theconnectionweightsareassignedwithsmallrandomnumbersatthebeginning.Theincominginputvectorpresentedbyasampledatapointisreceivedbytheinputnodes.Theinputvectoristransmittedtotheoutputnodesviatheconnections.Theactivationoftheoutputnodesdependsupontheinput.Inawinnertakeallcompetition,theoutputnodewiththeweightsmostsimilartotheinputvectorbecomesactive.Inthelearningstage,theweightsareupdatedfollowingKohonenlearningFig.1.Selforganizingmaps.
编号:201311171041075511    大小:311.72KB    格式:PDF    上传时间:2013-11-17
  【编辑】
5
关 键 词:
教育专区 外文翻译 精品文档 外文翻译
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
0条评论

还可以输入200字符

暂无评论,赶快抢占沙发吧。

当前资源信息

4.0
 
(2人评价)
浏览:15次
英文资料库上传于2013-11-17

官方联系方式

客服手机:13961746681   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

相关资源

相关资源

相关搜索

教育专区   外文翻译   精品文档   外文翻译  
关于我们 - 网站声明 - 网站地图 - 友情链接 - 网站客服客服 - 联系我们
copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5