




已阅读5页,还剩71页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
SpinnakerLabs Inc GoogleClusterComputingFacultyTrainingWorkshop ModuleV HadoopTechnicalReview SpinnakerLabs Inc Overview HadoopTechnicalWalkthroughHDFSDatabasesUsingHadoopinanAcademicEnvironmentPerformancetipsandothertools YouSay tomato SpinnakerLabs Inc SomeMapReduceTerminology Job A fullprogram anexecutionofaMapperandReduceracrossadatasetTask AnexecutionofaMapperoraReduceronasliceofdataa k a Task In Progress TIP TaskAttempt Aparticularinstanceofanattempttoexecuteataskonamachine SpinnakerLabs Inc TerminologyExample Running WordCount across20filesisonejob20filestobemappedimply20maptasks somenumberofreducetasksAtleast20maptaskattemptswillbeperformed moreifamachinecrashes etc SpinnakerLabs Inc TaskAttempts Aparticulartaskwillbeattemptedatleastonce possiblymoretimesifitcrashesIfthesameinputcausescrashesoverandover thatinputwilleventuallybeabandonedMultipleattemptsatonetaskmayoccurinparallelwithspeculativeexecutionturnedonTaskIDfromTaskInProgressisnotauniqueidentifier don tuseitthatway SpinnakerLabs Inc MapReduce HighLevel SpinnakerLabs Inc Node to NodeCommunication HadoopusesitsownRPCprotocolAllcommunicationbeginsinslavenodesPreventscircular waitdeadlockSlavesperiodicallypollfor status messageClassesmustprovideexplicitserialization SpinnakerLabs Inc Nodes Trackers Tasks MasternoderunsJobTrackerinstance whichacceptsJobrequestsfromclientsTaskTrackerinstancesrunonslavenodesTaskTrackerforksseparateJavaprocessfortaskinstances SpinnakerLabs Inc JobDistribution MapReduceprogramsarecontainedinaJava jar file anXMLfilecontainingserializedprogramconfigurationoptionsRunningaMapReducejobplacesthesefilesintotheHDFSandnotifiesTaskTrackerswheretoretrievetherelevantprogramcode Where sthedatadistribution SpinnakerLabs Inc DataDistribution ImplicitindesignofMapReduce Allmappersareequivalent somapwhateverdataislocaltoaparticularnodeinHDFSIflotsofdatadoeshappentopileuponthesamenode nearbynodeswillmapinsteadDatatransferishandledimplicitlybyHDFS SpinnakerLabs Inc ConfiguringWithJobConf MRProgramshavemanyconfigurableoptionsJobConfobjectshold key value componentsmappingString ae g mapred map tasks 20JobConfisserializedanddistributedbeforerunningthejobObjectsimplementingJobConfigurablecanretrieveelementsfromaJobConf SpinnakerLabs Inc WhatHappensInMapReduce DepthFirst SpinnakerLabs Inc JobLaunchProcess Client ClientprogramcreatesaJobConfIdentifyclassesimplementingMapperandReducerinterfacesJobConf setMapperClass setReducerClass Specifyinputs outputsJobConf setInputPath setOutputPath Optionally otheroptionstoo JobConf setNumReduceTasks JobConf setOutputFormat SpinnakerLabs Inc JobLaunchProcess JobClient PassJobConftoJobClient runJob orsubmitJob runJob blocks submitJob doesnotJobClient DeterminesproperdivisionofinputintoInputSplitsSendsjobdatatomasterJobTrackerserver SpinnakerLabs Inc JobLaunchProcess JobTracker JobTracker InsertsjarandJobConf serializedtoXML insharedlocationPostsaJobInProgresstoitsrunqueue SpinnakerLabs Inc JobLaunchProcess TaskTracker TaskTrackersrunningonslavenodesperiodicallyqueryJobTrackerforworkRetrievejob specificjarandconfigLaunchtaskinseparateinstanceofJavamain isprovidedbyHadoop SpinnakerLabs Inc JobLaunchProcess Task TaskTracker Child main SetsupthechildTaskInProgressattemptReadsXMLconfigurationConnectsbacktonecessaryMapReducecomponentsviaRPCUsesTaskRunnertolaunchuserprocess SpinnakerLabs Inc JobLaunchProcess TaskRunner TaskRunner MapTaskRunner MapRunnerworkinadaisy chaintolaunchyourMapperTaskknowsaheadoftimewhichInputSplitsitshouldbemappingCallsMapperonceforeachrecordretrievedfromtheInputSplitRunningtheReducerismuchthesame SpinnakerLabs Inc CreatingtheMapper YouprovidetheinstanceofMapperShouldextendMapReduceBaseOneinstanceofyourMapperisinitializedbytheMapTaskRunnerforaTaskInProgressExistsinseparateprocessfromallotherinstancesofMapper nodatasharing SpinnakerLabs Inc Mapper voidmap WritableComparablekey Writablevalue OutputCollectoroutput Reporterreporter SpinnakerLabs Inc WhatisWritable Hadoopdefinesitsown box classesforstrings Text integers IntWritable etc AllvaluesareinstancesofWritableAllkeysareinstancesofWritableComparable SpinnakerLabs Inc WritingForCacheCoherency while moreinputexists myIntermediate newintermediate input myIntermediate process exportoutputs SpinnakerLabs Inc WritingForCacheCoherency myIntermediate newintermediate junk while moreinputexists myIntermediate setupState input myIntermediate process exportoutputs SpinnakerLabs Inc WritingForCacheCoherency RunningtheGCtakestimeReusinglocationsallowsbettercacheusageSpeedupcanbeasmuchastwo foldAllserializabletypesmustbeWritableanyway somakeuseoftheinterface GettingDataToTheMapper SpinnakerLabs Inc ReadingData DatasetsarespecifiedbyInputFormatsDefinesinputdata e g adirectory IdentifiespartitionsofthedatathatformanInputSplitFactoryforRecordReaderobjectstoextract k v recordsfromtheinputsource SpinnakerLabs Inc FileInputFormatandFriends TextInputFormat Treatseach n terminatedlineofafileasavalueKeyValueTextInputFormat Maps n terminatedtextlinesof kSEPv SequenceFileInputFormat Binaryfileof k v pairswithsomeadd lmetadataSequenceFileAsTextInputFormat Same butmaps k toString v toString SpinnakerLabs Inc FilteringFileInputs FileInputFormatwillreadallfilesoutofaspecifieddirectoryandsendthemtothemapperDelegatesfilteringthisfilelisttoamethodsubclassesmayoverridee g Createyourown xyzFileInputFormat toread xyzfromdirectorylist SpinnakerLabs Inc RecordReaders EachInputFormatprovidesitsownRecordReaderimplementationProvides unused capabilitymultiplexingLineRecordReader ReadsalinefromatextfileKeyValueRecordReader UsedbyKeyValueTextInputFormat SpinnakerLabs Inc InputSplitSize FileInputFormatwilldividelargefilesintochunksExactsizecontrolledbymapred min split sizeRecordReadersreceivefile offset andlengthofchunkCustomInputFormatimplementationsmayoverridesplitsize e g NeverChunkFile SpinnakerLabs Inc SendingDataToReducers MapfunctionreceivesOutputCollectorobjectOutputCollector collect takes k v elementsAny WritableComparable Writable canbeused SpinnakerLabs Inc WritableComparator ComparesWritableComparabledataWillcallWritableCpare CanprovidefastpathforserializeddataJobConf setOutputValueGroupingComparator SpinnakerLabs Inc SendingDataToTheClient ReporterobjectsenttoMapperallowssimpleasynchronousfeedbackincrCounter Enumkey longamount setStatus Stringmsg Allowsself identificationofinputInputSplitgetInputSplit PartitionAndShuffle SpinnakerLabs Inc Partitioner intgetPartition key val numPartitions OutputsthepartitionnumberforagivenkeyOnepartition valuessenttooneReducetaskHashPartitionerusedbydefaultUseskey hashCode toreturnpartitionnumJobConfsetsPartitionerimplementation SpinnakerLabs Inc Reduction reduce WritableComparablekey Iteratorvalues OutputCollectoroutput Reporterreporter Keys valuessenttoonepartitionallgotothesamereducetaskCallsaresortedbykey earlier keysarereducedandoutputbefore later keys SpinnakerLabs Inc Finally WritingTheOutput SpinnakerLabs Inc OutputFormat AnalogoustoInputFormatTextOutputFormat Writes keyval n stringstooutputfileSequenceFileOutputFormat Usesabinaryformattopack k v pairsNullOutputFormat Discardsoutput SpinnakerLabs Inc HDFS SpinnakerLabs Inc HDFSLimitations Almost GFSNofileupdateoptions recordappend etc allfilesarewrite onceDoesnotimplementdemandreplicationDesignedforstreamingRandomseeksdevastateperformance SpinnakerLabs Inc NameNode Head interfacetoHDFSclusterRecordsallglobalmetadata SpinnakerLabs Inc SecondaryNameNode NotafailoverNameNode Recordsmetadatasnapshotsfrom real NameNodeCanmergeupdatelogsinflightCanuploadsnapshotbacktoprimary SpinnakerLabs Inc NameNodeDeath NonewrequestscanbeservedwhileNameNodeisdownSecondarywillnotfailoverasnewprimarySowhyhaveasecondaryatall SpinnakerLabs Inc NameNodeDeath cont d IfNameNodediesfromsoftwareglitch justrebootButifmachineishosed metadataforclusterisirretrievable SpinnakerLabs Inc BringingtheClusterBack IforiginalNameNodecanberestored secondarycanre establishthemostcurrentmetadatasnapshotIfnot createanewNameNode usesecondarytocopymetadatatonewprimary restartwholecluster Isthereanotherway SpinnakerLabs Inc KeepingtheClusterUp Problem DataNodes fix theaddressoftheNameNodeinmemory can tswitchinflightSolution BringnewNameNodeup butuseDNStomakeclusterbelieveit stheoriginaloneSecondarycanbethe new one SpinnakerLabs Inc FurtherReliabilityMeasures NamenodecanoutputmultiplecopiesofmetadatafilestodifferentdirectoriesIncludinganNFSmountedoneMaydegradeperformance watchforNFSlocks SpinnakerLabs Inc Databases SpinnakerLabs Inc LifeAfterGFS StraightGFSfilesarenottheonlystorageoptionHBase ontopofGFS providescolumn orientedstoragemySQLandotherdbenginesstillrelevant SpinnakerLabs Inc HBase CaninterfacedirectlywithHadoopProvidesitsownInput andOutputFormatclasses sendsrowsdirectlytomapper receivesnewrowsfromreducer Butmightnotbereadyforclassroomuse leaststablecomponent SpinnakerLabs Inc MySQLClustering MySQLdatabasecanbeshardedonmultipleserversForfastIO usesamemachinesasHadoopTablescanbesplitacrossmachinesbyrowkeyrangeMultiplereplicascanservesametable SpinnakerLabs Inc Sharding HadoopPartitioners Forbestperformance ReducershouldgostraighttolocalmysqlinstanceGetalldataintherightmachineinonecopyImplementcustomPartitionertoensureparticularkeyrangegoestomysql awareReducer SpinnakerLabs Inc AcademicHadoopRequirements SpinnakerLabs Inc ServerProfile UWcluster 40nodes 80processorstotal2GBram processor24TBrawstoragespace 8TBreplicated OnenodereservedforJobTracker NameNodeTwomorewouldn tcooperate Butstillvastlyoverpowered SpinnakerLabs Inc Setup Maintenance TookabouttwodaystosetupandconfigureMostlyhardware relatedissuesHadoopsetupwasonlyacouplehoursMaintenance onlyafewhours weekMostlyrebootingtheclusterwhenjobsgotstuck SpinnakerLabs Inc TotalUsage About15 000CPU hoursconsumedby20students Outof130 000availableoverquarterAverageloadisabout12 SpinnakerLabs Inc Analyzingstudentusagepatterns SpinnakerLabs Inc NotQuitetheWholeStory Realistically studentsdidmostworkveryclosetodeadlineClustersatunusedforafewdays followedbyoverloadingfortwodaysstraight SpinnakerLabs Inc Analyzingstudentusagepatterns Lesson ResourcedemandsareNOTconstant SpinnakerLabs Inc HadoopJobScheduling FIFOqueuematchesincomingjobstoavailablenodesNonotionoffairnessNeverswitchesoutrunningjobRun awaytaskscouldstarveotherstudentjobs SpinnakerLabs Inc HadoopSecurity Butonthebright side NosecuritysystemforjobsAnyonecanstartajob buttheycanalsocancelotherjobsRealistically studentsdidnotcancelotherstudentjobs evenwhentheyshould SpinnakerLabs Inc HadoopSecurity TheDarkSide NopermissionsinHDFSeitherJustnowaddedin0 16OnestudentdeletedthecommondatasetforaprojectEmailsubject Oops Nostudentscouldtesttheircodeuntildatasetrestoredfrombackup SpinnakerLabs Inc JobSchedulingLessons Gettingstudentsto playnice ishardNoincentiveJustplainbad buggycodeClustercontentioncausedproblemsatdeadlinesWorkingroupsStaggerdeadlines SpinnakerLabs Inc AnotherPossibility AmazonEC2provideson demandserversMaybeabletohavestudentsusetheseforjobs Labfee wouldbe 150 studentSimpleweb basedinterfacesexistRHadoopOnDemand HOD comingsoonInjectsnewnodesintoliveclusters SpinnakerLabs Inc MorePerformance Scalability SpinnakerLabs Inc NumberofTasks Mappers 10 nodes or3 2 cores Reducers 2 nodes or1 05 cores Twodegreesoffreedominmapperruntime Numberoftasks node andsizeofInputSplitsSeehttp wiki apache org lucene hadoop HowManyMapsAndReduces SpinnakerLabs Inc MorePerformanceTweaks Hadoopdefaultstoheapcapof200MBSet mapred child java opts Xmx512m1024MB processmayalsobeappropriateDFSblocksizeis64MBForhugefiles setdfs block size 134217728mapred reduce parallel copiesSetto15 50 moredata morecopies SpinnakerLabs Inc DeadTasks Studentjobswould runaway adminrestartneededVeryoftenstuckinhugeshuffleprocessStudentsdidnotknowaboutPartitionerclass mayhave
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2017正规租房合同范本
- 植物学奥赛题目及答案
- 人员培训与开发试题及答案(一)
- 人教版高一上学期语文期末考试试卷(含答案)
- 直营店招聘合同范本
- 法律咨询服务合同
- 俄语试卷题目及答案
- 健康保障考试试题题库及答案
- 2025年实验幼儿园教职工考核量化细则
- CN222960731U 环形跟踪上料站 (温州优匠工品科技有限公司)
- 小升初简历模板2020免费
- 19-雾在哪里ppt市公开课金奖市赛课一等奖课件
- 城镇道路工程施工与质量验收规范
- 金融统计分析教材课件
- 《社会主义核心价值观》优秀课件
- 经纬度基础知识
- 大学生团支书竞选PPT模板
- DDI定向井难度系数
- 河南省家庭经济困难学生认定申请表
- 电催化精品课件
- 踏虎凿花的探究 详细版课件
评论
0/150
提交评论