会员注册 | 登录 | 微信快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

   首页 人人文库网 > 资源分类 > PDF文档下载

外文翻译---不完整测量数据的概念建构 英文版.pdf

  • 资源星级:
  • 资源大小:311.72KB   全文页数:13页
  • 资源格式: PDF        下载权限:注册会员/VIP会员
您还没有登陆,请先登录。登陆后即可下载此文档。
  合作网站登录: 微信快捷登录 支付宝快捷登录   QQ登录   微博登录
友情提示
2:本站资源不支持迅雷下载,请使用浏览器直接下载(不支持QQ浏览器)
3:本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

外文翻译---不完整测量数据的概念建构 英文版.pdf

unsuspectedrelationshipswhichareofinterestorvaluetothedatabasesowners,ordataminers9.Duetothelargenumberofdimensionalityandthehugevolumeofdata,traditionalstatisticalmethodshavetheirlimitationsindatamining.Tomeetthechallengeofdatamining,artificialintelligencebasedhuman–computerinteractivetechniqueshavebeenwidelyusedindatamining3,16.ConceptualconstructiononincompletesurveydataShouhongWanga,,HaiWangbaDepartmentofMarketing/BusinessInformationSystems,CharltonCollegeofBusiness,UniversityofMassachusettsDartmouth,285OldWestportRoad,NorthDartmouth,MA027472300,USAbDepartmentofComputerScience,UniversityofToronto,Toronto,ON,CanadaM5S3G4Received22March2003receivedinrevisedform9September2003accepted20October2003Availableonline26November2003AbstractTherawsurveydatafordataminingareoftenincomplete.Theissuesofmissingdatainknowledgediscoveryareoftenignoredindatamining.Thisarticlepresentstheconceptualfoundationsofdataminingwithincompletesurveydata,andproposesqueryprocessingforknowledgediscoveryandasetofqueryfunctionsfortheconceptualconstructioninsurveydatamining.Throughacase,thispaperdemonstratesthatconceptualconstructiononincompletedatacanbeaccomplishedbyusingartificialintelligencetoolssuchasselforganizingmaps.C2112003ElsevierB.V.Allrightsreserved.KeywordsIncompletesurveydataSurveydataminingConceptualconstructionSelforganizingmapsClusteranalysisKnowledgediscoveryQueryprocessing1.IntroductionDataminingistheprocessoftrawlingthroughdatainthehopeofidentifyinginterpretablepatterns.Dataminingisdifferentfromtraditionalstatisticalanalysisinthatitisaimedatfindingwww.elsevier.com/locate/datakDataKnowledgeEngineering492004311–323Correspondingauthor.Emailaddressesswangumassd.eduS.Wang,haics.toronto.eduH.Wang.0169023X/seefrontmatterC2112003ElsevierB.V.Allrightsreserved.doi10.1016/j.datak.2003.10.007aneffectivemethodindealingwithhighdimensionaldata6,12.Moreimportantly,theSOMmethodprovidesabaseforthevisibilityofclustersofhighdimensionaldata.Thisfeatureisnot312S.Wang,H.Wang/DataKnowledgeEngineering492004311–323availableinanyotherdataanalysismethods.Itallowsthedataminertoanalyzeclustersbasedontheproblemdomain.Surveyisoneofthecommondataacquisitionmethodsfordatamining4.Indatamining,onecanrarelyfindasurveydatasetthatcontainscompleteentriesofeachobservationforallofthevariables.Commonly,surveysandquestionnairesareoftenonlypartiallycompletedbyrespondents.Theextentofdamageofmissingdataisunknownwhenitisvirtuallyimpossibletoreturnthesurveyorquestionnairestothedatasourceforcompletion,butisoneofthemostimportantpartsofknowledgefordataminingtodiscover.Infact,missingdataisanimportantdebatableissueintheknowledgeengineeringfield15.Inminingasurveydatabasewithincompletedatathroughclusteranalysis,patternsofthemissingdataaswellasthepotentialimpactsofthesemissingdataontheminingresultsareknowledge.Forinstance,adatamineroftenwishestoknowhowreliableaclusteranalysisiswhenandwhycertaintypesofvaluesareoftenmissingwhatvariablesarecorrelatedintermsofhavingmissingvaluesatthesametime.Thesevaluablepiecesofknowledgecanbediscoveredonlyafterthemissingpartofthedatasetisfullyexplored.Thispaperdiscussestheissueofmissingdatainminingsurveydatabasesforknowledgediscovery,presentstheconceptualfoundationsofconceptualconstruction,andproposesasetofqueryfunctionsforconceptualconstructioninSOMbaseddatamining.Therestofthepaperisorganizedasfollows.Section2discussestheissuesofmissingdatarelatedtodatamining.Section3introducesSOMforconceptualconstructiononincompletedata.Section4suggestsfourconceptsasknowledgediscoveryindataminingwithincompletedata.ItprovidesaschemeofconceptualconstructiononincompletedatausingSOM.Section5proposesaquerytoolthatisusedtomanipulateSOMforconceptualconstruction.Section6presentsacasestudythatappliesthequerytooltomanipulatetheSOMfortheconceptualconstructiononastudentopinionsurveydataset.Finally,Section7offersconcludingremarks.2.IssuesofmissingdataIncompletedatasetsareubiquitousindatamining.Therehavebeenmanytreatmentsofmissingdata.Oneoftheconvenientsolutionstoincompletedataistoeliminatefromthedatasetthoserecordsthataremissingvalues.This,however,ignorespotentiallyusefulinformationinthoserecords.Incaseswheretheproportionofmissingdataislarge,theconclusionsdrawnfromthescreeneddatasetaremorelikelybiasedormisleading.Therehavebeenmanynonstatisticaltechniquesfordatamining.TheselforganizingmapsSOMmethodbasedonKohonenneuralnetwork12isoneofthepromisingtechniques.SOMbasedclustertechniqueshaveadvantagesoverothermethodsfordatamining.Dataminingtypicallydealswithveryhighdimensionaldata.Thatis,anobservationinthedatabasefordataminingistypicallydescribedbyalargenumberofvariables.Thecurseofdimensionalityturnsstatisticalcorrelationsofdatainsignificant,andthusmakesstatisticalmethodspowerless.TheSOMmethod,however,doesnotrelyonanyassumptionsofstatisticaltests,andisconsideredasS.Wang,H.Wang/DataKnowledgeEngineering492004311–323313Anothersimpleapproachofdealingwithmissingdataistousegenericunknownforallmissingdataitems.Indatamining,unspecifiedunknownforallmissingdataitemsoftencausesconfusionandmisinterpretation.Thethirdsolutiontodealingwithmissingdataistoestimatethemissingvalueinthedatafield.Inthecaseoftimeseriesdata,interpolationbasedontwoadjacentdatapointsthatareobservedispossible.Ingeneralcases,onemayusesomeexpectedvalueinthedatafieldbasedonstatisticalmeasures7.However,indatamining,surveydataarecommonlyofthetypesofranking,category,multiplechoices,andbinary.Interpolationanduseofanexpectedvalueforaparticularmissingdatavariableinthesecasesaregenerallyinadequate.Moreimportantly,research2indicatesthatameaningfultreatmentofmissingdatashallalwaysbeindependentoftheproblembeinginvestigated.Morerecently,therehavebeenmathematicalmethodsforfindingtheaggregateconceptualdirectionsofadatasetwithmissingdatae.g.,1,10.Thesemethodsmakethemselvesdistinctfromthetraditionalapproachesoftreatingmissingdatabyfocusingonthecollectiveeffectsofthemissingdatainsteadofindividualmissingvalues.Thissuperiorfeatureofthesemethodscanbebestbuiltupfordataminingonincompletedata.However,thesestatisticalmethodshavelimitations.First,itisassumedthatmissingvaluesoccurinarandomfashionorfollowacertaindistributionfunctions.Theirstrongassumptionsaboutthedistributionsofdataareofteninvalidespeciallyforcasesofsurveywithincompletedata.Second,thesemathematicalmodelsaredatadriven,insteadofproblemdomaindriven.Infact,asinglegenericconceptualconstructionalgorithmisinsufficienttohandleavarietyofgoalsofdataminingsinceagoalofdataminingisoftenrelatedtoitsspecificproblemdomain.Knowledgediscoveryindatabasesisthenontrivialprocessofidentifyingvalid,novel,potentiallyuseful,andultimatelyunderstandablepatternsofdata8.Followingthisdefinition,thisresearchemphasizestwoaspectsofconceptconstructionindataminingwithincompletedata.First,thecriteriaofvalidity,novelty,usefulnessoftheconceptstobeconstructedindataminingwithincompletedatacouldbeproblemdependent.Thatis,theinterestofadatapatterndependsonthedatamineranddoesnotsolelydependontheestimatedstatisticalstrengthofthepattern14.Second,theconceptualconstructionbasedontheincompletedataisaccomplishedthroughheuristicsearchincombinatorialspacesbuiltoncomputerandhumancognitivetheories13.Human–computercollaborationconceptconstructionistheinteractiveprocessbetweenthedataminerandcomputertoextractnovel,plausible,useful,relevant,andinterestingknowledgeassociatedwiththemissingdata.Inourview,dataminingdiffersfromtraditionalstatisticsindealingmissingdatainmanyways.1Dataminingattemptstoextractunsuspectedandpotentiallyusefulpatternsfromthedataforthedataminerswithnovelgoalsrelatedtothemissingdata,ratherthantoestimatetheindividualvaluesofthemissingdata.2Dataminingisahumancenteredprocessimplementedthroughknowledgediscoveryloopscoupledwithhuman–computerinteractiontoperceivetheimpactofthemissingdataatanaggregatelevel,ratherthanaonewaymathematicalderivationbasedonunverifiedassumptions.3.ToolforconceptualconstructionselforganizingmapsSOMGivenalargesetofhighdimensionalsurveysamples,thereusuallybeasignificantnumberofobservationshavemissingvalueshowever,notallmissingdataarerelevanttothedataminerC213sinterest.Hence,anysimplebruteforcesearchmethodformissingdataisnotonlyinfeasibleforahugeamountofdata,butalsohelplesswhenthedatamineristoidentifyproblems,ordevelopconcepts,throughdatamining.Toidentifyproblemsordevelopconcepts,thedataminerneedsatooltoobserveunsuspectedpatternsoftheavailabledataandthemissingparts.SelforganizingmapsSOM12havebeenwidelyusedforclustering,sinceSOMaremorecomputationallyefficientthanthepopularkmeansclusteringalgorithm.Moreimportantly,SOMprovidedatavisualizationforthedataminertoviewhighdimensionaldata11.Research14,16314S.Wang,H.Wang/DataKnowledgeEngineering492004311–323indicatesthatSOMareeffectiveindataminingfortheidentificationofunsuspectedpatternofthedata.Specifically,SOMcanbeusedforclusteranalysisonmultivariatesurveydata.ThisstudytakesonestepfurtherandusesSOMasatoolforconceptconstructionrelatedtomissingdata.Conceptualconstructiononincompletedataistoinvestigatethepatternsofthemissingdataaswellasthepotentialimpactsofthesemissingdataontheminingresultsbasedonlyonthecompletedata.Asseenlaterinourillustrativeexamples,SOMprovideamechanismforhuman–computercollaborationtoconstructconceptsfromthedatawithmissingvalues.SOMcanlearncertainusefulfeaturesfoundintheirinputpatternsthroughtheunsupervisedcompetitivelearningprocess,andmapthehighdimensionaldataontolowdimensionalpictures,allowingthedataminertoviewthemapwithclusters.TheneuralnetworkdepictedinFig.1isthetwolayerSOMusedinthisstudy.Thenodesatthelowerlayerinputnodesreceiveinputspresentedbythesampledatapoints.Thenodesattheupperlayeroutputnodeswillrepresenttheorganizationmapoftheinputpatternsaftertheunsupervisedlearningprocess.Everylowlayernodeisconnectedtoeveryupperlayernodeviaavariableconnectionweight.TheunsupervisedlearningprocessinSOMcanbebrieflydescribedasfollows.Theconnectionweightsareassignedwithsmallrandomnumbersatthebeginning.Theincominginputvectorpresentedbyasampledatapointisreceivedbytheinputnodes.Theinputvectoristransmittedtotheoutputnodesviatheconnections.Theactivationoftheoutputnodesdependsupontheinput.Inawinnertakeallcompetition,theoutputnodewiththeweightsmostsimilartotheinputvectorbecomesactive.Inthelearningstage,theweightsareupdatedfollowingKohonenlearningFig.1.Selforganizingmaps.

注意事项

本文(外文翻译---不完整测量数据的概念建构 英文版.pdf)为本站会员(英文资料库)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网([email protected]),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。

copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5