计算机专业英语论文_第1页
计算机专业英语论文_第2页
计算机专业英语论文_第3页
计算机专业英语论文_第4页
计算机专业英语论文_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

######专业英语结课论文学号:**********姓名:**********论文题目:TheRelationshipandDistinctionBetweenBigDataandDataMining任课教师:************专业名称:计算机技术所属学院:计算机科学与工程学院桂林电子科技大学研究生院**年*月*日TheRelationshipandDistinctionBetweenBigDataandDataMiningStudentID:*Name:*Adviser:*GuilinUniversityofElectronicTechnology**,*Abstract:Inthispaper,dataminingisdiscussedinthecontextofbigdata.Firstly,weelaboratethefactthatbigdataplaysaprimaryroleinattractingacademiccommunity,businessindustryandgovernments.Secondly,theadverseofbigdataisdiscussed,suchasmuchgarbage,heavypollutionanditsdifficultiesinutilization.Finally,wedissectthevalueinbigdata,expoundthetechniquestodiscoverknowledgefrombigdata,andinvestigatethetransformationfromknowledgeintodataintelligences.Keywords:bigdata;datamining;dataintelligenceIntroductionAsdatavolumescontinuetoincreaseexponentially,thedatatsunamicaneasilyoverwhelmtraditionalanalyticstoolsorplatformsdesignedtoingest,analyzeandreport.

Everyday,2.5quintillionbytesofdataarecreatedand90percentofthedataintheworldtodaywereproducedwithinthepasttwoyears[1].Thechallengewearefacingisnotonlyhowtostoreandmanagediversedatabutalsotoeffectivelyanalyzethedatatogaininsightknowledgetomakesmarterdecisions.

Currently,anumberofworkshavebeenpresented.

Theseresearchesintroducebigdata,miningandanalyzingfromdifferentaspects,suchasstatusquo,ideasorimplementations.

Forexample:introducesthe“LambdaArchitecture”whichprovidesageneralpurposeapproachtoimplementarbitraryfunctionsonmassivedatasetinrealtime;ascalabledeepanalyticsplatformhasbeenimplemented.Becauseofthecomplexity,thereisnosingletoolorone-size-fits-allsolutionfordeeplyminingandanalyzingthebigdata.Moreover,extractingvaluableknowledgefrommassivedatasetsrequiresfurtherstudies,experimentsaswellasscalableandsmartservices,programmingtoolsandapplicationsachieved.Theremainderofthispaperisstructuredasfollows.

Section2

elaboratethefactthatbigdataplaysaprimaryroleineveryfields.Thentheadverseofbigdataisdiscussedinsection3.Afteranalyzingthevalueofbigdata,weintroduces

therelated

knowledgeanddevelopmentofdataminingin

section5.In

Section6,theeffectivenessofdataminingisintroduced.Finally,theconclusionfollow.AboutbigdataBigdataiscomplexdatasetthathasthefollowingmaincharacteristics:Volume,Variety,VelocityandVeracity[2][3].

Thesemakeitdifficulttousetheexistingtoolstomanageandmanipulate.Inthesedata,bigdataspecificallyaccountsforthevastmajority.

Bigdataisthebasisofdataandsourceofwisdomforpeopletounderstandthereal-worldthroughtheinformationworld.

BigDataiscloselyrelatedtoapplications[4][5],andbigdataminingisitsprincipalapplication.2.1Fromunderstandingthereal-worldtocreatingtheinformationworldHumancivilizationisaprocessfromunderstandingthereal-worldtocreatingtheinformationworld,whichhasgonethroughthefollowingstages:preliminarysensingtheworld,helpingmemorybyinformation,recordedandinheritedbyinformation,exchangeandcommunicationbyinformationandunderstandingtheworldonceagainbyinformation.Initially,Humantakeadvantageofstonesandshellstocountaccordingtotheprincipleofone-to-one.AndtheytieknotsNotetohelpmemory.Later,Humanusesimplegraphics,drawnotes,andinheritmoreaccuratememorythroughtheirownemotionalprompted.Whenthegraphicsbecomebodyrelativelyfixedcommonsymbol,andassociatewiththewordsinthelanguage,itproducestexts.Textsabstractandgeneralizetheworld,promoteculturalunderstanding,andpreparethenecessaryfoundationforthedevelopmentofscience.Aimedatbreakingthroughtherestrictionswhichthewrittensymbolsdependonartificialcopyingorengraving,Humanusemachinesafterindustrialrevolutiontovolumemechanizedproduction,whichimprovestheefficiencyoftheculturaltransmission.Computercentershigh-speedcomputing,andspinsoffthesoftwarefromthehardware,contributingtothedisseminationofinformation“electronically”and“automatically”.Internetcentersnetwork,interrelatescomputers,breakinglocalinformationrestriction.Mobilecommunicationcentersusers,makingthemachinefollowsuser'smovementsandunboundshumanfromthemachine.InternetofThingscentersapplications,automaticallyidentifiesobjects,toenabletheinformationsharingbetweenthehumanandthings.Cloudcomputingcentersservicebyconsolidatingexpertiseandoptimizingtheallocationofresources.

Bigdatacentersdata,andminesknowledgeintheentiredata,breakingthesamplingrandomnessofthesample[6][7],anddemonstratingonbigdatacenterandmobileterminal.

Theseinformationtechnologiesservefortheunderstandingandtransformingoftherealworld.2.2BigdataisattractingmuchattentionAshumansexploretherealworldthroughscientificresearch,humansunravelthemysteriesintheinformationworldthroughbigdataanddatamining,whichareattractingmuchattentionfromacademia.InMay2011,McKinseypublished“Bigdata:thenextfrontierforinnovation,competition,andproductivity”,analyzedapplicationpotentialofbigdataindifferentindustriesfromtheeconomicandcommercialdimensions,spelledoutthedevelopmentpolicyfortheGovernmentandindustrydecisionmakersdealingwithbigdata.

InJanuary2012,the“WallStreetJournal”arguedthatbigdata,smartproductionandwirelessnetworktwillleadtoneweconomicprosperity[8].

InMarch2012,theUnitedStatesgovernmentreleased“BigDataResearchandDevelopmentInitiative”,whichrosesthedevelopmentandapplicationofbigdatafrombusinessconducttonationaldeploymentstrategicinordertoimprovetheabilitytoextractknowledgefromlargeandcomplexdata,tohelpsolvesomeofthenation'smostpressingchallenges.

InApril2012,“NatureBiotechnology”invitedeightbiologiststoevaluateanarticlewhichpublishedinDecember2011on“Science”titling“DetectingNovelAssociationsinLargeDataSets”inapapertitled“Findingcorrelationsinbigdata”.

InJuly2012,Gartnerreleasedthefirstdatasurveyreport“HypeCycleforBigData,2012”,whichthoughtdeeplyinbigdata[9].InChina[10],bigdataattractsasmuchattentionasitdoesaroundtheworld.BaiduusesHadooptodooff-lineprocessingsince2007.Currently,Baiduhasover10,000Hadoopservers,whichismorethanYahooandFacebook,anditplanstoreach20,000in2013.Intheseservers,80%Hadoopclustersareprocessing0totalof6TBdataeverydayonloganalysis.Tencent,TaobaoandAlipayarealsousingHadooptoestablishdatawarehouseandhandlebigdata.InApril2010,Taobaolaunchedadataminingplatform“datacube”,basedonanonehundredbillionleveldatabasenamedOceanBase,whichsupportsfor4to5milliontimesupdateoperation,includingover2billionrecords,containingmorethan2.5TBdatainoneday.InMay2010,ChinaMobileestablishedamassivedistributedsystemsandstructuredmassdatamanagementsystemonthecloud.Huaweianalyzesdatabasedonmobileterminalsandstoragemassivedatathroughthecloudtoobtainvaluableinformation.Alibabaanalyzesbusinesstransactiondatathroughbigdatatechnologytodocreditapproval.BigdatadisasterBigdataiscloselyrelatedtohumandailylife,permeatedallwalksoflife.Thenumber,sizeandcomplexityareallinsharpincreasing.

Alargeamountofdatahasbeenstoredinthedatabaseandwarehouseintypesoftext,graphics,imagesandmultimedia[11].

TheresearchfromInternationalDataCorporation

hasshownthat,asof2003humanshavecreatedatotalof5EBdata,whileintheyearof2011,theamountofdatathathadbeencopiedandproducedisexceeded1.8ZB.Itisexpectedthatby2020globaldatausagewillreach35.2ZB,whichneeds37.6billionharddrivesof1TBcapacitytostore.Ontheonehandthesedatabroadensthescopeofavailablebigdataavailableforhumantogainwisdom.Ontheotherhandthevalueofasingleunitofthedataisrapidlydeclining.Humanissubmergedbythedataoceanbutthirstyforknowledge.3.1GarbageBigdataisvoluminousanditgrowsquickly,butithasverylowdensityinvalue,whichmeansthereisalotofjunkdata[12].Thestudyontheelectron-positroncolliderhasbeenabletoshoot40millionpicturespersecond,butonlyafewthousandsareuseful.RomaniaInternetsecuritycompanyBitDefenderpointedoutthat

spamandfishinginformationinthesocialnetworkgamehasincreasedbymorethan50%.Comparedtootheronlinecommunicationenvironment,socialnetworkusersaremoreeasilytounknowinglyacceptandloadgarbageinformation.Bigdataandapplicationsarecloselyrelated,andprofessionallabelingofthedataisthebasicobjectiveofrationalanalysisandsoundjudgment.

Whetherscientificexperimentaldataorobservationdataneedtobelabeledbyexpertsinthefield.

AccordingtotheIDCstatistics,in2012only23%ofallinformationisuseful,ofwhichonly3%ofpotentiallyusefulinformationhadbeenlabeled,andtheproportionofdatawhichhadbeenanalyzedismuchless.Withthedevelopmentofmodernmeasuringtechniqueanddigitalrecordingmethod,inthefaceofhugeinformation,traditional,artificial,experienceeliminationandanalysismethodshavebecomepowerless.3.2ContaminationDatacollectedfromtherealworldiscontaminated.Moreover,asearlyas1992,theMassachusettsInstituteofTechnologyfoundthatdatacontaminationproblemsarenotisolated.Inthe50unitsandagenciesthataresampledforthesurvey,mostofthedataaccuracyislessthan95%.

Regardlessofaccesstos8atialdata,therearesomeinevitableproblemsorerrors[13][14],suchascontentsincomplete,precisionerror,dataredundancy,formatcontradictory,differenttype,structureuncertainties,differentscales,differentstandard,outdated,errorexception,dynamicchangeandlocalsparse.Moreovereachissuehasanumberofcauses.Forexample,thenoisecanbeperiodicnoise,stripenoise,isolatednoiseandrandomnoise.Further,thesedataareoftenaffectedbygrosserrors,systemerrorsandrandomerrorsindividuallyorcollaboratively.

Itisboundtodamagetheexpecteddataaccuracyifthesethreekindsoferrorcannotbecorrectlyfoundandeliminatedintheadjustment.3.3DifficulttouseDataisnotonlycontaminated,butalsodifficulttouse.Theproduction,transmission,replicationandaccumulationofdatahavegonefarbeyondpeople'scapacityforanalyzing,understandingandimplementing.Duetothelargeamountof“bigdata”,itisdifficulttocollect,store,search,share,analyzeandmaterialize.

Commercialimageprocessingsoftware(ERDAS,IMAGINE,PCI,ENVI,etc.)aredifficulttocompletethefollowingmission:mixpixel,imagematchautomatically,targetextractautomatically,andotherautomaticprocessingmissionbecausethelackofnewtheoriesandmethods.Anewspaperpublishedthesamearticleofthesameauthorontwodifferentpagesof“legalcommunity”and“youthtopics”.AnothernewspaperpublishedthreearticlesintheEditionof“homeappliances”,“lifestyle”and“scienceandtechnology”,alltocompareamongVCD,CVDandDVDonthesameday,andgotthreedifferentconclusions,buttheeditordidnotevenrealizeit.Overtime,allwalksoflifearesubmergedbycontaminateddatagarbage,andthenitcouldleadthebigdatainto“garbagein,garbageout”,andthe“bigdata”becomestheuseless“biggarbage”.Now,usefuldataisburied,andimpliedvalueisblankedinbigdata.

Onsuchapredicament,followingproblemsarethebottlenecksforbigdataresearchtobreakthrough:howtounderstandthespatialdata,howtoextractinformationfromthedata,howtoturndataintoknowledgecanbeavailable,andfinallyhowtorealizethevalueofdata.ThevalueofdataBigdataiscollectedfromnumerousandinterconnectedsources.Realusefulnessisitsmaximumvalue.Thegenerallyacceptedruleofbigdatais“decisionondata”.Thefirstprerequisiteistokeepdataalwaysusefulandactivated.Theultimatevalueofbigdataistogainhumanintelligence.4.1OverallcognitiveoriginalappearanceBigdataprovidesanunprecedentedopportunitytoobservetherealworldinafullviewratherthanpartialsamples.Withoutbigdata,probabilitystatisticscanonlybeproducedbasedonrandomsamplingfromtherealworld,becausespacedataisconstrainedbycollection,storage,computingandtransmission.Liketheproverbialblindmengraspinganelephantcanonlytakeapartforthewhole,thereisonlyalimitedview.Incompletedatasamplingandsampledatadispersionmakeitdifficulttounderstandtheoveralltrendsortonoticetheabnormalchanges.4.2BasicresourcesMcKinseybelievesthatdataisthebasicresource,andcanbecomparedwithphysicalassets,humancapital,createsignificantvaluefortheworldeconomy,improvetheproductivityandcompetitivenessoftheenterprisesandthepublicsector,andcreatealargenumberofeconomicsurplusforconsumers.In2011,theWorldEconomicForumcalledbigdataasnewwealth.In2012,theDavosForum“BigData,BigImpact”treateddataaseconomicassetlikecurrencyorgold.In2012,Gartnerbelievesthat“Bigdataisbigmoney”.

TheU.S.governmentconsidersbigdataas“newoil”relatedtothecountry'seconomicrestructuringandindustrialupgrading[3].DataminingDataminingreferstothebasictechnologiestorealizethevalueofbigdata,relocatedataassets,anduseiteffectively.Spatialdataminingcanbeusedtoextractinformationfromdata,mineknowledgefrominformation,extractdataintelligenceinknowledge,improvetheabilityofself-learning,self-feedbackadaptation,finallyrealizehuman-machineintelligence.5.1BasicbigdatatechnologyThebasictechniquesofbigdataincludedatacollection,storage,processing,expression,andqualityevaluation.Bigdatacanbegeneratedinmobiledevices,trackingsystems,radiofrequencyidentificationdevices(RFIDs),sensornetworks,socialnetworking,Internetsearch,automaticrecordingsystems,videoarchives,e-commerce,aswellastheprocessinanalyzingthosedata.Bigdatastoragetechnologyisthebasisfordatamining.Itisdesignedtomeetthegrowingneedfordatastorage,whichaimstoprovidescalability,highreliability,excellentperformancedatastorage,access,andmanagementsolution,suchasdistributeddatastorage,multiplelevelscaching,loadbalancing,fault-tolerantmechanisms.Conventionalmethodsarenotadequateforthesemissions.Itneedstoestablishalargeplatformfordatathroughsoftware,toprovideplacestostoreandinterfacetoaccess.

Bigdataprocessingistoimplementthetransitions:fromdatatoinformation,frominformationtoknowledgeandfromknowledgetowisdom.

Bigdataexpressiontechnologyisdesignedtorepresentthedatainaclearandeffectivewaythatrevealsmeaningfulinformationtotheuser,orprovidetheuserwithanewperspectiveofview.Bigdataexpressiontechnologyincludesdigitalelevationmodels,digitalterrainmodels,flatmaps,three-dimensionalmaps,anddigitalcitymaps.Bigdataqualityassessmenttechnologyisaimedtoavoidtheriskofbigdatacollectingandhigh-densitymeasuring.Thetechnologyincludeslogicalassessmentmethod,exceptionvaluebasedassessmentmethod,andaccountingbasedassessmentmethod.5.2DiscoveryknowledgeKnowledgediscoveryisthetechnologythatusesdataminingmethodtoextractpreviouslyunknown,potentiallyuseful,andultimatelycomprehensiblerules.Itisalsoaprocessofgradualsublimationfromdatatoinformation,andtoknowledge,step-by-step.Dataminingsystemsaimstomakedatagraduallysummarizedintoknowledge.Throughtheintegrationofdata,itcandeeplyextractknowledge.Byusingsuchnewknowledge,datacanbeprocessedinrealtimeinordertounderstandandapplythedata,tomakeintelligentjudgmentsandwell-informeddecisions.Knowledgecanbeself-learning,self-enhance,universal,andeasilyrecognized.Itcouldserveasabasisfordecisionsupport.Ifbusinessestakefulladvantageofknowledge,itwillbemorepreciseanddynamicforhumanstolearn,work,life,andachievewisdomstate.Itwillhelptoimproveresourceutilizationandproductivitylevel.Moreover,itwillalsohelptorespondtotheeconomiccrisis,theenergycrisis,thedeteriorationoftheenvironmentandmanyotherglobalissues.5.3ExtractiondataintelligenceDataintelligenceistheabilitytoobtainamoreinnovative,systematicandcomprehensiveknowledgetosolveaparticularproblemthroughanin-depthanalysisofthecollecteddata.Itisanabilitytounderstandandsolveproblemsfast,flexiblyandcorrectly.Spatialdataintelligenthasthreefeatures:morethoroughlyperception,moreextensiveinteroperability,anddeeperintelligence.

Thethreefeaturesareaimedtogetbiggerandmorecomprehensivedata,toshareandco-operatedataviatheInternet,tododataanalysisanddataminingbyvarietyofadvancedtechniques,andtoconstituteahierarchyofspatialdataintelligences(Fig1).Figure1.

ThehierarchyofspatialdataintelligencesBigdataintelligencedoesnotrefertosimpleoverlaydifferentdataminingtechniques,butareasonablestructureofindustry-orientedorganization,goodrunner,andpowerfulwisdomsystem.Themorereasonableindustrystructurebecome,thesmallerinternalfrictiongot,thegreatereffectivenessgot,andthehigherwisdomsystemgot.Everytimewhenapersoninteractingwiththedatahe/shebecomesmoreefficientandmoreproductive,whichmeansitformsabetterwaytoanalyze,summarize,andcalculate.Throughtheconsolidationandanalysisoftrans-regional,trans-sectordata,withknowledgeappliedinspecificindustry,specificscenesandspecificsolution,bigdataintelligencecansupportdecision-makingandactioninabetterway.

Morein-depthdataintelligenceistocreatenewvalueofdata.Ontheonehand,whenmakingfulluseofspatialdataknowledgeinallwalksoflife,itcanproducesecondaryknowledge.Inordertoformaminingmechanismtomineknowledgeinknowledge,itneedstobringprimaryknowledgetogethertoformanintelligentformofexpression.Ultimately,thedestinationknowledgecanbeachieved.Ontheotherhand,basedonageneralindustrialorsocio-ecologicalsystem,itcanredefinetheinteractivemodeofgovernment,companiesandindividuals,sothatitimprovestheinteractionclarity,efficiency,flexibilityandresponsespeed.Itchangesfromthetraditionalsingledimensionsuchas:productionconsumption,managementbemanagement,orplanningexecution,toanewmulti-dimensionalcollaborativerelationship.Inthisnewrelationship,bothindividualsandorganizationscanfreelycontributeandgetinformationandexpertiseaccuratelyandtimely.Thisnewrelationshipexertsapositiveinfluenceoneachothertoreachsmartrunningmacro-effects.EffectivenessWhenwepossessthenecessaryknowledgeandabilitytocontrolit,thedatabecomesourvaluableassetthatleadstomarketdominationandhugeeconomicreturns.

Bigdatatechnologyprovidersusetechnologyforusersprocessingstructured,semi-structuredandunstructureddata.BigdataapplicationsareincreasinglyInternetubiquitous,richinterfaced,andfragmented.Itisaverticalintegrationintheapplicationindustry,therefore,businessthatisclosertoend-users,tendstohavealargerinfluenceintheindustrychain.MorganStanley'sreportinsiststhat“BigDataissoontobecomeAnyData[15]”,Inordertowinthefuture,therationalchoiceisthat“givingcustomersthetechnologiestheyneedtostoreandanalyze‘any’dataset-anytypeofdata,anysizeofdata,foranytypeofuser,andinanytimeframe.”ConclusionThedevelopmentofbigdataextendsthescopeofhumanactivities.Itdemandsproperattentionfromacademia,industryandgovernment.Theworldhasbeencooperatingandintegratingonaglobalscale.Humanisenforcedtochangemodefromthelocaltotheglobalintheireverydaylifeandwork.Itredefinestherelationshipamongindividuals,businesses,organizations,gov

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论