版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
######专业英语结课论文学号:**********姓名:**********论文题目:TheRelationshipandDistinctionBetweenBigDataandDataMining任课教师:************专业名称:计算机技术所属学院:计算机科学与工程学院桂林电子科技大学研究生院**年*月*日TheRelationshipandDistinctionBetweenBigDataandDataMiningStudentID:*Name:*Adviser:*GuilinUniversityofElectronicTechnology**,*Abstract:Inthispaper,dataminingisdiscussedinthecontextofbigdata.Firstly,weelaboratethefactthatbigdataplaysaprimaryroleinattractingacademiccommunity,businessindustryandgovernments.Secondly,theadverseofbigdataisdiscussed,suchasmuchgarbage,heavypollutionanditsdifficultiesinutilization.Finally,wedissectthevalueinbigdata,expoundthetechniquestodiscoverknowledgefrombigdata,andinvestigatethetransformationfromknowledgeintodataintelligences.Keywords:bigdata;datamining;dataintelligenceIntroductionAsdatavolumescontinuetoincreaseexponentially,thedatatsunamicaneasilyoverwhelmtraditionalanalyticstoolsorplatformsdesignedtoingest,analyzeandreport.
Everyday,2.5quintillionbytesofdataarecreatedand90percentofthedataintheworldtodaywereproducedwithinthepasttwoyears[1].Thechallengewearefacingisnotonlyhowtostoreandmanagediversedatabutalsotoeffectivelyanalyzethedatatogaininsightknowledgetomakesmarterdecisions.
Currently,anumberofworkshavebeenpresented.
Theseresearchesintroducebigdata,miningandanalyzingfromdifferentaspects,suchasstatusquo,ideasorimplementations.
Forexample:introducesthe“LambdaArchitecture”whichprovidesageneralpurposeapproachtoimplementarbitraryfunctionsonmassivedatasetinrealtime;ascalabledeepanalyticsplatformhasbeenimplemented.Becauseofthecomplexity,thereisnosingletoolorone-size-fits-allsolutionfordeeplyminingandanalyzingthebigdata.Moreover,extractingvaluableknowledgefrommassivedatasetsrequiresfurtherstudies,experimentsaswellasscalableandsmartservices,programmingtoolsandapplicationsachieved.Theremainderofthispaperisstructuredasfollows.
Section2
elaboratethefactthatbigdataplaysaprimaryroleineveryfields.Thentheadverseofbigdataisdiscussedinsection3.Afteranalyzingthevalueofbigdata,weintroduces
therelated
knowledgeanddevelopmentofdataminingin
section5.In
Section6,theeffectivenessofdataminingisintroduced.Finally,theconclusionfollow.AboutbigdataBigdataiscomplexdatasetthathasthefollowingmaincharacteristics:Volume,Variety,VelocityandVeracity[2][3].
Thesemakeitdifficulttousetheexistingtoolstomanageandmanipulate.Inthesedata,bigdataspecificallyaccountsforthevastmajority.
Bigdataisthebasisofdataandsourceofwisdomforpeopletounderstandthereal-worldthroughtheinformationworld.
BigDataiscloselyrelatedtoapplications[4][5],andbigdataminingisitsprincipalapplication.2.1Fromunderstandingthereal-worldtocreatingtheinformationworldHumancivilizationisaprocessfromunderstandingthereal-worldtocreatingtheinformationworld,whichhasgonethroughthefollowingstages:preliminarysensingtheworld,helpingmemorybyinformation,recordedandinheritedbyinformation,exchangeandcommunicationbyinformationandunderstandingtheworldonceagainbyinformation.Initially,Humantakeadvantageofstonesandshellstocountaccordingtotheprincipleofone-to-one.AndtheytieknotsNotetohelpmemory.Later,Humanusesimplegraphics,drawnotes,andinheritmoreaccuratememorythroughtheirownemotionalprompted.Whenthegraphicsbecomebodyrelativelyfixedcommonsymbol,andassociatewiththewordsinthelanguage,itproducestexts.Textsabstractandgeneralizetheworld,promoteculturalunderstanding,andpreparethenecessaryfoundationforthedevelopmentofscience.Aimedatbreakingthroughtherestrictionswhichthewrittensymbolsdependonartificialcopyingorengraving,Humanusemachinesafterindustrialrevolutiontovolumemechanizedproduction,whichimprovestheefficiencyoftheculturaltransmission.Computercentershigh-speedcomputing,andspinsoffthesoftwarefromthehardware,contributingtothedisseminationofinformation“electronically”and“automatically”.Internetcentersnetwork,interrelatescomputers,breakinglocalinformationrestriction.Mobilecommunicationcentersusers,makingthemachinefollowsuser'smovementsandunboundshumanfromthemachine.InternetofThingscentersapplications,automaticallyidentifiesobjects,toenabletheinformationsharingbetweenthehumanandthings.Cloudcomputingcentersservicebyconsolidatingexpertiseandoptimizingtheallocationofresources.
Bigdatacentersdata,andminesknowledgeintheentiredata,breakingthesamplingrandomnessofthesample[6][7],anddemonstratingonbigdatacenterandmobileterminal.
Theseinformationtechnologiesservefortheunderstandingandtransformingoftherealworld.2.2BigdataisattractingmuchattentionAshumansexploretherealworldthroughscientificresearch,humansunravelthemysteriesintheinformationworldthroughbigdataanddatamining,whichareattractingmuchattentionfromacademia.InMay2011,McKinseypublished“Bigdata:thenextfrontierforinnovation,competition,andproductivity”,analyzedapplicationpotentialofbigdataindifferentindustriesfromtheeconomicandcommercialdimensions,spelledoutthedevelopmentpolicyfortheGovernmentandindustrydecisionmakersdealingwithbigdata.
InJanuary2012,the“WallStreetJournal”arguedthatbigdata,smartproductionandwirelessnetworktwillleadtoneweconomicprosperity[8].
InMarch2012,theUnitedStatesgovernmentreleased“BigDataResearchandDevelopmentInitiative”,whichrosesthedevelopmentandapplicationofbigdatafrombusinessconducttonationaldeploymentstrategicinordertoimprovetheabilitytoextractknowledgefromlargeandcomplexdata,tohelpsolvesomeofthenation'smostpressingchallenges.
InApril2012,“NatureBiotechnology”invitedeightbiologiststoevaluateanarticlewhichpublishedinDecember2011on“Science”titling“DetectingNovelAssociationsinLargeDataSets”inapapertitled“Findingcorrelationsinbigdata”.
InJuly2012,Gartnerreleasedthefirstdatasurveyreport“HypeCycleforBigData,2012”,whichthoughtdeeplyinbigdata[9].InChina[10],bigdataattractsasmuchattentionasitdoesaroundtheworld.BaiduusesHadooptodooff-lineprocessingsince2007.Currently,Baiduhasover10,000Hadoopservers,whichismorethanYahooandFacebook,anditplanstoreach20,000in2013.Intheseservers,80%Hadoopclustersareprocessing0totalof6TBdataeverydayonloganalysis.Tencent,TaobaoandAlipayarealsousingHadooptoestablishdatawarehouseandhandlebigdata.InApril2010,Taobaolaunchedadataminingplatform“datacube”,basedonanonehundredbillionleveldatabasenamedOceanBase,whichsupportsfor4to5milliontimesupdateoperation,includingover2billionrecords,containingmorethan2.5TBdatainoneday.InMay2010,ChinaMobileestablishedamassivedistributedsystemsandstructuredmassdatamanagementsystemonthecloud.Huaweianalyzesdatabasedonmobileterminalsandstoragemassivedatathroughthecloudtoobtainvaluableinformation.Alibabaanalyzesbusinesstransactiondatathroughbigdatatechnologytodocreditapproval.BigdatadisasterBigdataiscloselyrelatedtohumandailylife,permeatedallwalksoflife.Thenumber,sizeandcomplexityareallinsharpincreasing.
Alargeamountofdatahasbeenstoredinthedatabaseandwarehouseintypesoftext,graphics,imagesandmultimedia[11].
TheresearchfromInternationalDataCorporation
hasshownthat,asof2003humanshavecreatedatotalof5EBdata,whileintheyearof2011,theamountofdatathathadbeencopiedandproducedisexceeded1.8ZB.Itisexpectedthatby2020globaldatausagewillreach35.2ZB,whichneeds37.6billionharddrivesof1TBcapacitytostore.Ontheonehandthesedatabroadensthescopeofavailablebigdataavailableforhumantogainwisdom.Ontheotherhandthevalueofasingleunitofthedataisrapidlydeclining.Humanissubmergedbythedataoceanbutthirstyforknowledge.3.1GarbageBigdataisvoluminousanditgrowsquickly,butithasverylowdensityinvalue,whichmeansthereisalotofjunkdata[12].Thestudyontheelectron-positroncolliderhasbeenabletoshoot40millionpicturespersecond,butonlyafewthousandsareuseful.RomaniaInternetsecuritycompanyBitDefenderpointedoutthat
spamandfishinginformationinthesocialnetworkgamehasincreasedbymorethan50%.Comparedtootheronlinecommunicationenvironment,socialnetworkusersaremoreeasilytounknowinglyacceptandloadgarbageinformation.Bigdataandapplicationsarecloselyrelated,andprofessionallabelingofthedataisthebasicobjectiveofrationalanalysisandsoundjudgment.
Whetherscientificexperimentaldataorobservationdataneedtobelabeledbyexpertsinthefield.
AccordingtotheIDCstatistics,in2012only23%ofallinformationisuseful,ofwhichonly3%ofpotentiallyusefulinformationhadbeenlabeled,andtheproportionofdatawhichhadbeenanalyzedismuchless.Withthedevelopmentofmodernmeasuringtechniqueanddigitalrecordingmethod,inthefaceofhugeinformation,traditional,artificial,experienceeliminationandanalysismethodshavebecomepowerless.3.2ContaminationDatacollectedfromtherealworldiscontaminated.Moreover,asearlyas1992,theMassachusettsInstituteofTechnologyfoundthatdatacontaminationproblemsarenotisolated.Inthe50unitsandagenciesthataresampledforthesurvey,mostofthedataaccuracyislessthan95%.
Regardlessofaccesstos8atialdata,therearesomeinevitableproblemsorerrors[13][14],suchascontentsincomplete,precisionerror,dataredundancy,formatcontradictory,differenttype,structureuncertainties,differentscales,differentstandard,outdated,errorexception,dynamicchangeandlocalsparse.Moreovereachissuehasanumberofcauses.Forexample,thenoisecanbeperiodicnoise,stripenoise,isolatednoiseandrandomnoise.Further,thesedataareoftenaffectedbygrosserrors,systemerrorsandrandomerrorsindividuallyorcollaboratively.
Itisboundtodamagetheexpecteddataaccuracyifthesethreekindsoferrorcannotbecorrectlyfoundandeliminatedintheadjustment.3.3DifficulttouseDataisnotonlycontaminated,butalsodifficulttouse.Theproduction,transmission,replicationandaccumulationofdatahavegonefarbeyondpeople'scapacityforanalyzing,understandingandimplementing.Duetothelargeamountof“bigdata”,itisdifficulttocollect,store,search,share,analyzeandmaterialize.
Commercialimageprocessingsoftware(ERDAS,IMAGINE,PCI,ENVI,etc.)aredifficulttocompletethefollowingmission:mixpixel,imagematchautomatically,targetextractautomatically,andotherautomaticprocessingmissionbecausethelackofnewtheoriesandmethods.Anewspaperpublishedthesamearticleofthesameauthorontwodifferentpagesof“legalcommunity”and“youthtopics”.AnothernewspaperpublishedthreearticlesintheEditionof“homeappliances”,“lifestyle”and“scienceandtechnology”,alltocompareamongVCD,CVDandDVDonthesameday,andgotthreedifferentconclusions,buttheeditordidnotevenrealizeit.Overtime,allwalksoflifearesubmergedbycontaminateddatagarbage,andthenitcouldleadthebigdatainto“garbagein,garbageout”,andthe“bigdata”becomestheuseless“biggarbage”.Now,usefuldataisburied,andimpliedvalueisblankedinbigdata.
Onsuchapredicament,followingproblemsarethebottlenecksforbigdataresearchtobreakthrough:howtounderstandthespatialdata,howtoextractinformationfromthedata,howtoturndataintoknowledgecanbeavailable,andfinallyhowtorealizethevalueofdata.ThevalueofdataBigdataiscollectedfromnumerousandinterconnectedsources.Realusefulnessisitsmaximumvalue.Thegenerallyacceptedruleofbigdatais“decisionondata”.Thefirstprerequisiteistokeepdataalwaysusefulandactivated.Theultimatevalueofbigdataistogainhumanintelligence.4.1OverallcognitiveoriginalappearanceBigdataprovidesanunprecedentedopportunitytoobservetherealworldinafullviewratherthanpartialsamples.Withoutbigdata,probabilitystatisticscanonlybeproducedbasedonrandomsamplingfromtherealworld,becausespacedataisconstrainedbycollection,storage,computingandtransmission.Liketheproverbialblindmengraspinganelephantcanonlytakeapartforthewhole,thereisonlyalimitedview.Incompletedatasamplingandsampledatadispersionmakeitdifficulttounderstandtheoveralltrendsortonoticetheabnormalchanges.4.2BasicresourcesMcKinseybelievesthatdataisthebasicresource,andcanbecomparedwithphysicalassets,humancapital,createsignificantvaluefortheworldeconomy,improvetheproductivityandcompetitivenessoftheenterprisesandthepublicsector,andcreatealargenumberofeconomicsurplusforconsumers.In2011,theWorldEconomicForumcalledbigdataasnewwealth.In2012,theDavosForum“BigData,BigImpact”treateddataaseconomicassetlikecurrencyorgold.In2012,Gartnerbelievesthat“Bigdataisbigmoney”.
TheU.S.governmentconsidersbigdataas“newoil”relatedtothecountry'seconomicrestructuringandindustrialupgrading[3].DataminingDataminingreferstothebasictechnologiestorealizethevalueofbigdata,relocatedataassets,anduseiteffectively.Spatialdataminingcanbeusedtoextractinformationfromdata,mineknowledgefrominformation,extractdataintelligenceinknowledge,improvetheabilityofself-learning,self-feedbackadaptation,finallyrealizehuman-machineintelligence.5.1BasicbigdatatechnologyThebasictechniquesofbigdataincludedatacollection,storage,processing,expression,andqualityevaluation.Bigdatacanbegeneratedinmobiledevices,trackingsystems,radiofrequencyidentificationdevices(RFIDs),sensornetworks,socialnetworking,Internetsearch,automaticrecordingsystems,videoarchives,e-commerce,aswellastheprocessinanalyzingthosedata.Bigdatastoragetechnologyisthebasisfordatamining.Itisdesignedtomeetthegrowingneedfordatastorage,whichaimstoprovidescalability,highreliability,excellentperformancedatastorage,access,andmanagementsolution,suchasdistributeddatastorage,multiplelevelscaching,loadbalancing,fault-tolerantmechanisms.Conventionalmethodsarenotadequateforthesemissions.Itneedstoestablishalargeplatformfordatathroughsoftware,toprovideplacestostoreandinterfacetoaccess.
Bigdataprocessingistoimplementthetransitions:fromdatatoinformation,frominformationtoknowledgeandfromknowledgetowisdom.
Bigdataexpressiontechnologyisdesignedtorepresentthedatainaclearandeffectivewaythatrevealsmeaningfulinformationtotheuser,orprovidetheuserwithanewperspectiveofview.Bigdataexpressiontechnologyincludesdigitalelevationmodels,digitalterrainmodels,flatmaps,three-dimensionalmaps,anddigitalcitymaps.Bigdataqualityassessmenttechnologyisaimedtoavoidtheriskofbigdatacollectingandhigh-densitymeasuring.Thetechnologyincludeslogicalassessmentmethod,exceptionvaluebasedassessmentmethod,andaccountingbasedassessmentmethod.5.2DiscoveryknowledgeKnowledgediscoveryisthetechnologythatusesdataminingmethodtoextractpreviouslyunknown,potentiallyuseful,andultimatelycomprehensiblerules.Itisalsoaprocessofgradualsublimationfromdatatoinformation,andtoknowledge,step-by-step.Dataminingsystemsaimstomakedatagraduallysummarizedintoknowledge.Throughtheintegrationofdata,itcandeeplyextractknowledge.Byusingsuchnewknowledge,datacanbeprocessedinrealtimeinordertounderstandandapplythedata,tomakeintelligentjudgmentsandwell-informeddecisions.Knowledgecanbeself-learning,self-enhance,universal,andeasilyrecognized.Itcouldserveasabasisfordecisionsupport.Ifbusinessestakefulladvantageofknowledge,itwillbemorepreciseanddynamicforhumanstolearn,work,life,andachievewisdomstate.Itwillhelptoimproveresourceutilizationandproductivitylevel.Moreover,itwillalsohelptorespondtotheeconomiccrisis,theenergycrisis,thedeteriorationoftheenvironmentandmanyotherglobalissues.5.3ExtractiondataintelligenceDataintelligenceistheabilitytoobtainamoreinnovative,systematicandcomprehensiveknowledgetosolveaparticularproblemthroughanin-depthanalysisofthecollecteddata.Itisanabilitytounderstandandsolveproblemsfast,flexiblyandcorrectly.Spatialdataintelligenthasthreefeatures:morethoroughlyperception,moreextensiveinteroperability,anddeeperintelligence.
Thethreefeaturesareaimedtogetbiggerandmorecomprehensivedata,toshareandco-operatedataviatheInternet,tododataanalysisanddataminingbyvarietyofadvancedtechniques,andtoconstituteahierarchyofspatialdataintelligences(Fig1).Figure1.
ThehierarchyofspatialdataintelligencesBigdataintelligencedoesnotrefertosimpleoverlaydifferentdataminingtechniques,butareasonablestructureofindustry-orientedorganization,goodrunner,andpowerfulwisdomsystem.Themorereasonableindustrystructurebecome,thesmallerinternalfrictiongot,thegreatereffectivenessgot,andthehigherwisdomsystemgot.Everytimewhenapersoninteractingwiththedatahe/shebecomesmoreefficientandmoreproductive,whichmeansitformsabetterwaytoanalyze,summarize,andcalculate.Throughtheconsolidationandanalysisoftrans-regional,trans-sectordata,withknowledgeappliedinspecificindustry,specificscenesandspecificsolution,bigdataintelligencecansupportdecision-makingandactioninabetterway.
Morein-depthdataintelligenceistocreatenewvalueofdata.Ontheonehand,whenmakingfulluseofspatialdataknowledgeinallwalksoflife,itcanproducesecondaryknowledge.Inordertoformaminingmechanismtomineknowledgeinknowledge,itneedstobringprimaryknowledgetogethertoformanintelligentformofexpression.Ultimately,thedestinationknowledgecanbeachieved.Ontheotherhand,basedonageneralindustrialorsocio-ecologicalsystem,itcanredefinetheinteractivemodeofgovernment,companiesandindividuals,sothatitimprovestheinteractionclarity,efficiency,flexibilityandresponsespeed.Itchangesfromthetraditionalsingledimensionsuchas:productionconsumption,managementbemanagement,orplanningexecution,toanewmulti-dimensionalcollaborativerelationship.Inthisnewrelationship,bothindividualsandorganizationscanfreelycontributeandgetinformationandexpertiseaccuratelyandtimely.Thisnewrelationshipexertsapositiveinfluenceoneachothertoreachsmartrunningmacro-effects.EffectivenessWhenwepossessthenecessaryknowledgeandabilitytocontrolit,thedatabecomesourvaluableassetthatleadstomarketdominationandhugeeconomicreturns.
Bigdatatechnologyprovidersusetechnologyforusersprocessingstructured,semi-structuredandunstructureddata.BigdataapplicationsareincreasinglyInternetubiquitous,richinterfaced,andfragmented.Itisaverticalintegrationintheapplicationindustry,therefore,businessthatisclosertoend-users,tendstohavealargerinfluenceintheindustrychain.MorganStanley'sreportinsiststhat“BigDataissoontobecomeAnyData[15]”,Inordertowinthefuture,therationalchoiceisthat“givingcustomersthetechnologiestheyneedtostoreandanalyze‘any’dataset-anytypeofdata,anysizeofdata,foranytypeofuser,andinanytimeframe.”ConclusionThedevelopmentofbigdataextendsthescopeofhumanactivities.Itdemandsproperattentionfromacademia,industryandgovernment.Theworldhasbeencooperatingandintegratingonaglobalscale.Humanisenforcedtochangemodefromthelocaltotheglobalintheireverydaylifeandwork.Itredefinestherelationshipamongindividuals,businesses,organizations,gov
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年医院感染管理的年度工作计划(3篇)
- 2026年部编版语文五年级下册全套单元复习课教案
- 2026年大数据施工跨境物流服务合同
- 2026年工程评估分销代理协议
- 物理一模提分卷01-2026年中考第一次模拟考试(含答案)(江西专用)
- 村委大病探访工作制度
- 村庄亮化工作制度汇编
- 预约门诊挂号工作制度
- 领导代班值班工作制度
- 风控区管控区工作制度
- 2025河北林业和草原局事业单位笔试试题及答案
- 黑龙江哈尔滨德强学校2025-2026学年度六年级(五四制)下学期阶段学情调研语文试题(含答案)
- 广东江西稳派智慧上进教育联考2026届高三年级3月二轮复习阶段检测政治+答案
- 2026年商丘学院单招综合素质考试题库及答案详解(历年真题)
- 2025年大连职业技术学院单招职业技能考试试题及答案解析
- 既有线路基帮宽施工方案范本
- 用友渠道合作方案
- 农民工欠薪起诉书模板
- 课题研究存在的问题及今后设想
- DINEN1706铝和铝合金铸件化学成分和机械性能(中文版)
- 2023年康复医学考试重点复习资料
评论
0/150
提交评论