会员注册 | 登录 | 微信快捷登录 支付宝快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

56-The Future of Microprocessors.pdf56-The Future of Microprocessors.pdf -- 5 元

宽屏显示 收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

26September2005QUEUErantsfeedbackacmqueue.comTheFutureofMicroprocessorsKUNLEOLUKOTUNANDLANCEHAMMOND,STANFORDUNIVERSITYQUEUESeptember200527morequeuewww.acmqueue.comheperformanceofmicroprocessorsthatpowermoderncomputershascontinuedtoincreaseexponentiallyovertheyearsfortwomainreasons.First,thetransistorsthataretheheartofthecircuitsinallprocessorsandmemorychipshavesimplybecomefasterovertimeonacoursedescribedbyMooreslaw,1andthisdirectlyaffectstheperformanceofprocessorsbuiltwiththosetransistors.Moreover,actualprocessorperformancehasincreasedfasterthanMooreslawwouldpredict,2becauseprocessordesignershavebeenabletoharnesstheincreasingnumbersoftransistorsavailableonmodernchipstoextractmoreparallelismfromsoftware.Thisisdepictedinfigure1forIntelsprocessors.Aninterestingaspectofthiscontinualquestformoreparallelismisthatithasbeenpursuedinawaythathasbeenvirtuallyinvisibletosoftwareprogrammers.Sincetheywereinventedinthe1970s,microprocessorshavecontinuedtoimplementtheconventionalvonNeumanncomputationalmodel,withveryfewexceptionsormodifications.Toaprogrammer,eachcomputerconsistsofasingleprocessorexecutingastreamofsequentialinstructionsandconnectedtoamonolithicmemorythatholdsalloftheprogramsdata.Becausetheeconomicbenefitsofbackwardcompatibilitywithearliergenerationsofprocessorsaresostrong,hardwaredesignershaveessentiallybeenlimitedtoenhancementsthathavemaintainedthisabstractionfordecades.Onthememoryside,thishasresultedinprocessorswithlargercachememories,tokeepfrequentlyaccessedportionsoftheconceptualmemoryinsmall,fastmemoriesthatarephysicallyclosertotheprocessor,andlargeregisterfilestoholdmoreactivedatavaluesinanChipmultiprocessorspromiseofhugeperformancegainsisnowareality.MultiprocessorsFOCUS28September2005QUEUErantsfeedbackacmqueue.comextremelysmall,fast,andcompilermanagedregionofmemory.Withinprocessors,thishasresultedinavarietyofmodificationsdesignedtoachieveoneoftwogoalsincreasingthenumberofinstructionsfromtheprocessorsinstructionsequencethatcanbeissuedoneverycycle,orincreasingtheclockfrequencyoftheprocessorfasterthanMooreslawwouldnormallyallow.Pipeliningofindividualinstructionexecutionintoasequenceofstageshasalloweddesignerstoincreaseclockratesasinstructionshavebeenslicedintolargernumbersofincreasinglysmallsteps,whicharedesignedtoreducetheamountoflogicthatneedstoswitchduringeveryclockcycle.Instructionsthatoncetookafewcyclestoexecuteinthe1980snowoftentake20ormoreintodaysleadingedgeprocessors,allowinganearlyproportionalincreaseinthepossibleclockrate.Meanwhile,superscalarprocessorsweredevelopedtoexecutemultipleinstructionsfromasingle,conventionalinstructionstreamoneachcycle.Thesefunctionbydynamicallyexaminingsetsofinstructionsfromtheinstructionstreamtofindonescapableofparallelexecutiononeachcycle,andthenexecutingthem,oftenoutoforderwithrespecttotheoriginalprogram.Bothtechniqueshaveflourishedbecausetheyallowinstructionstoexecutemorequicklywhilemaintainingthekeyillusionforprogrammersthatallinstructionsareactuallybeingexecutedsequentiallyandinorder,insteadofoverlappedandoutoforder.Ofcourse,thisillusionisnotabsolute.Performancecanoftenbeimprovedifprogrammersorcompilersadjusttheirinstructionschedulinganddatalayouttomapmoreefficientlytotheunderlyingpipelinedorparallelarchitectureandcachememories,buttheimportantpointisthatoldoruntunedcodewillstillexecutecorrectlyonthearchitecture,albeitatlessthanpeakspeeds.Unfortunately,itisbecomingincreasinglydifficultforprocessordesignerstocontinueusingthesetechniquestoenhancethespeedofmodernprocessors.Typicalinstructionstreamshaveonlyalimitedamountofusableparallelismamonginstructions,3sosuperscalarprocessorsthatcanissuemorethanaboutfourinstructionspercycleachieveverylittleadditionalbenefitonmostapplications.Figure2showshoweffectiverealIntelprocessorshavebeenatextractinginstructionparallelismovertime.Thereisaflatregionbeforeinstructionlevelparallelismwaspursuedintensely,thenasteepriseasparallelismwasutilizedusefully,followedbyataperingoffinrecentyearsastheavailableparallelismhasbecomefullyexploited.Complicatingmattersfurther,buildingsuperscalarprocessorcoresthatcanexploitmorethanafewinstructionspercyclebecomesveryexpensive,becausethecomplexityofalltheadditionallogicrequiredtofindparallelinstructionsdynamicallyisapproximatelyproportionaltothesquareofthenumberofinstructionsthatcanbeissuedsimultaneously.Similarly,pipeliningpastabout1020stagesisdifficultbecauseeachpipelinestagebecomestooshorttoperformevenaminimalamountofIntelPerformanceOverTimerelativeperformanceyear0.101.0010.00100.001000.0010000.001985198719891991199319951997199920012003FIG1TheFutureofMicroprocessorsMultiprocessorsFOCUSQUEUESeptember200529morequeuewww.acmqueue.comlogic,suchasaddingtwointegerstogether,beyondwhichthedesignofthepipelineissignificantlymorecomplex.Inaddition,thecircuitryoverheadfromaddingpipelineregistersandbypasspathmultiplexerstotheexistinglogiccombineswithperformancelossesfromeventsthatcausepipelinestatetobeflushed,primarilybranches.Thisoverwhelmsanypotentialperformancegainfromdeeperpipeliningafterabout30stages.Furtheradvancesinbothsuperscalarissueandpipeliningarealsolimitedbythefactthattheyrequireeverlargernumbersoftransistorstobeintegratedintothehighspeedcentrallogicwithineachprocessorcoresomany,infact,thatfewcompaniescanaffordtohireenoughengineerstodesignandverifytheseprocessorcoresinreasonableamountsoftime.Thesetrendshaveslowedtheadvanceinprocessorperformancesomewhatandhaveforcedmanysmallervendorstoforsakethehighendprocessorbusiness,astheycouldnolongeraffordtocompeteeffectively.Today,however,allprogressinconventionalprocessorcoredevelopmenthasessentiallystoppedbecauseofasimplephysicallimitpower.Asprocessorswerepipelinedandmadeincreasinglysuperscalaroverthecourseofthepasttwodecades,typicalhighendmicroprocessorpowerwentfromlessthanawatttoover100watts.Eventhougheachsiliconprocessgenerationpromisedareductioninpower,astheeversmallertransistorsrequiredlesspowertoswitch,thiswastrueinpracticeonlywhenexistingdesignsweresimplyshrunktousethenewprocesstechnology.Processordesigners,however,keptusingmoretransistorsintheircorestoaddpipeliningandsuperscalarissue,andswitchingthemathigherandhigherfrequencies.Theoveralleffectwasthatexponentiallymorepowerwasrequiredbyeachsubsequentprocessorgenerationasillustratedinfigure3.Unfortunately,coolingtechnologydoesnotscaleexponentiallynearlyaseasily.Asaresult,processorswentfromneedingnoheatsinksinthe1980s,tomoderatesizeheatsinksinthe1990s,totodaysmonstrousheatsinks,oftenwithoneormorededicatedfanstoincreaseairflowovertheprocessor.Ifthesetrendsweretocontinue,thenextgenerationofmicroprocessorswouldrequireveryexoticcoolingsolutions,suchasdedicatedwatercooling,thatareeconomicallyimpracticalinallbutthemostexpensivesystems.Thecombinationoflimitedinstructionparallelismsuitableforsuperscalarissue,practicallimitstopipelining,andapowerceilinglimitedbypracticalcoolinglimitationshaslimitedfuturespeedincreaseswithinconventionalprocessorcorestothebasicMooreslawimprovementrateoftheunderlyingtransistors.ThislimitationisalreadycausingmajorprocessormanufacturerssuchasIntelandAMDtoadjusttheirmarketingfocusawayfromsimplecoreclockrate.Althoughlargercachememorieswillcontinuetoimproveperformancesomewhat,byspeedingaccesstothesinglememoryintheconventionalmodel,thesimplefactisthatwithoutmoreradicalchangesinprocessordesign,microprocessorperformanceincreaseswillslowdramaticallyinthefuture.Processordesignersmustfindnewwaystoeffectivelyutilizetheincreasingtransistorbudgetsinhighendsiliconchipstoimproveperformanceinwaysthatminimizebothadditionalpowerusageanddesigncomplexity.Themarketformicroprocessorshasbecomestratifiedintoareaswithdifferentperformancerequirements,soitisusefultoexaminetheproblemfromthepointofviewofthesedifferentperformancerequirements.IntelPerformancefromILPrelativeperformance/cycleyear198519871989199119931995199719992001200300.050.100.150.200.250.300.350.400.45FIG230September2005QUEUErantsfeedbackacmqueue.comTHROUGHPUTPERFORMANCEIMPROVEMENTWiththeriseoftheInternet,theneedforserverscapableofhandlingamultitudeofindependentrequestsarrivingrapidlyoverthenetworkhasincreaseddramatically.Sinceindividualnetworkrequestsaretypicallycompletelyindependenttasks,whetherthoserequestsareforWebpages,databaseaccess,orfileservice,theyaretypicallyspreadacrossmanyseparatecomputersbuiltusinghighperformanceconventionalmicroprocessorsfigure4a,atechniquethathasbeenusedatplaceslikeGoogleforyearstomatchtheoverallcomputationthroughputtotheinputrequestrate.4Asthenumberofrequestsincreasedovertime,moreserverswereaddedtothecollection.Ithasalsobeenpossibletoreplacesomeoralloftheseparateserverswithmultiprocessors.Mostexistingmultiprocessorsconsistoftwoormoreseparateprocessorsconnectedusingacommonbus,switchhub,ornetworktosharedmemoryandI/Odevices.Theoverallsystemcanusuallybephysicallysmalleranduselesspowerthananequivalentsetofuniprocessorsystemsbecausephysicallylargecomponentssuchasmemory,harddrives,andpowersuppliescanbesharedbysomeoralloftheprocessors.Pressurehasincreasedovertimetoachievemoreperformanceperunitvolumeofdatacenterspaceandperwatt,sincedatacentershavefiniteroomforserversandtheirelectricbillscanbestaggering.Inresponse,theservermanufacturershavetriedtosavespacebyadoptingdenserserverpackagingsolutions,suchasbladeserversandswitchingtomultiprocessorsthatcansharecomponents.Somepowerreductionhasalsooccurredthroughthesharingofmorepowerhungrycomponentsinthesesystems.Theseshorttermsolutionsarereachingtheirpracticallimits,however,assystemsarereachingthemaximumcomponentdensitythatcanstillbeeffectivelyaircooled.Asaresult,thenextstageofdevelopmentforthesesystemsinvolvesanewsteptheCMPchipmultiprocessor.5ThefirstCMPstargetedtowardtheservermarketimplementtwoormoreconventionalsuperscalarprocessorstogetheronasingledie.6,7,8,9Theprimarymotivationforthisisreducedvolumemultipleprocessorscannowfitinthespacewhereformerlyonlyonecould,sooverallperformanceperunitvolumecanbeincreased.Somesavingsinpoweralsooccursbecausealloftheprocessorsonasinglediecanshareasingleconnectiontotherestofthesystem,reducingtheamountofhighspeedcommunicationinfrastructurerequired,inadditiontothesharingpossiblewithaconventionalmultiprocessor.SomeCMPs,suchasthefirstonesannouncedfromAMDandIntel,shareonlythesysteminterfacebetweenprocessorcoresillustratedinfigure4b,butothersshareoneormorelevelsofonchipcachefigure4c,whichallowsinterprocessorcommunicationbetweentheCMPcoreswithoutoffchipaccesses.Furthersavingsinpowercanbeachievedbytakingadvantageofthefactthatwhileserverworkloadsrequirehighthroughput,thelatencyofeachrequestisgenerallyIntelPowerOverTimepowerwattsyear1985198719891991199319951997199920012003010100TheFutureofMicroprocessorsFIG3MultiprocessorsFOCUSQUEUESeptember200531morequeuewww.acmqueue.comnotascritical.10MostuserswillnotbebotherediftheirWebpagestakeafractionofasecondlongertoload,buttheywillcomplainiftheWebsitedropspagerequestsbecauseitdoesnothaveenoughthroughputcapacity.ACMPbasedsystemcanbedesignedtotakeadvantageofthissituation.WhenatwowayCMPreplacesauniprocessor,itispossibletoachieveessentiallythesameorbetterthroughputonserverorientedworkloadswithjusthalfoftheoriginalclockspeed.Eachrequestmaytakeuptotwiceaslongtoprocessbecauseofthereducedclockrate.Withmanyoftheseapplications,however,theslowdownwillbemuchless,becauserequestprocessingtimeismoreoftenlimitedbymemoryordiskperformancethanbyprocessorperformance.Sincetworequestscannowbeprocessedsimultaneously,however,theoverallthroughputwillnowbethesameorbetter,unlessthereisseriouscontentionforthesamememoryordiskresources.Overall,eventhoughperformanceisthesameoronlyalittlebetter,thisadjustmentisstilladvantageousatthesystemlevel.Thelowerclockrateallowsustodesignthesystemwithasignificantlylowerpowersupplyvoltage,oftenanearlylinearreduction.Sincepowerisproportionaltothesquareofthevoltage,however,thepowerrequiredtoobtaintheoriginalperformanceismuchlowerusuallyabouthalfhalfofthevoltagesquaredaquarterofthepower,perprocessor,sothepowerrequiredforbothprocessorstogetherisabouthalf,althoughthepotentialsavingscouldbelimitedbystaticpowerdissipationandanyminimumvoltagelevelsrequiredbytheunderlyingtransistors.Forthroughputorientedworkloads,evenmorepower/performanceandperformance/chipareacanbeachievedbytakingthelatencyisunimportantideatoitsextremeandbuildingtheCMPwithmanysmallcoresinsteadofafewlargeones.Becausetypicalserverworkloadshaveverylowamountsofinstructionlevelparallelismandmanymemorystalls,mostofthehardwareassociatedwithsuperscalarinstructionissueisessentiallywastedfortheseapplications.Atypicalserverwillhavetensorhundredsofrequestsinflightatonce,however,sothereisenoughworkavailabletokeepmanyprocessorsbusysimultaneously.Therefore,replacingeachlarge,superscalarprocessorinaCMPwithseveralsmallones,ashasbeendemonstratedsuccessfullywiththeSunNiagara,11isawinningpolicy.Eachsmallprocessorwillprocessitsrequestmoreslowlythanalarger,superscalarprocessor,butthislatencyslowdownismorethancompensatedforbythefactthatthesamechipareacanbeoccupiedbyamuchlargernumberofprocessorsaboutfourtimesasmany,inthecaseCMPImplementationOptionsmainmemoryL2cacheCPUcore1L1IL1DregsregsregsregsCPUcoreNL1IL1DregsregsregsregsI/Odmultithreaded,sharedcachechipmultiprocessormainmemoryL2cacheL2cacheCPUcore1L1IL1DregistersregistersCPUcoreNL1IL1DI/OcsharedcachechipmultiprocessormainmemoryL2cacheL2cacheCPUcore1L1IL1DregistersregistersCPUcoreNL1IL1DI/ObsimplechipmultiprocessormainmemoryCPUcoreL1IL1DregistersI/OaconventionalmicroprocessorFIG432September2005QUEUErantsfeedbackacmqueue.comofNiagara,whichhaseightsingleissueSPARCprocessorcoresinatechnologythatcanholdonlyapairofsuperscalarUltraSPARCcores.Takingthisideaonestepfurther,stillmorelatencycanbetradedforhigherthroughputwiththeinclusionofmultithreadinglogicwithineachofthecores.12,13,14Becauseeachcoretendstospendafairamountoftimewaitingformemoryrequeststobesatisfied,itmakessensetoassigneachcoreseveralthreadsbyincludingmultipleregisterfiles,oneperthread,withineachcorefigure4d.Whilesomeofthethreadsarewaitingformemorytorespond,theprocessormaystillexecuteinstructionsfromtheothers.Largernumbersofthreadscanalsoalloweachprocessortosendmorerequestsofftomemoryinparallel,increasingtheutilizationofthehighlypipelinedmemorysystemsontodaysprocessors.Overall,threadswilltypicallyhaveaslightlylongerlatency,becausetherearetimeswhenallareactiveandcompetingfortheuseoftheprocessorcore.ThegainfromperformingcomputationduringmemorystallsandtheabilitytolaunchnumerousmemoryaccessessimultaneouslymorethancompensatesforthislongerlatencyonsystemssuchasNiagara,whichhasfourthreadsperprocessoror32fortheentirechip,andPentiumchipswithIntelsHyperthreading,whichallowstwothreadstoshareaPentium4core.LATENCYPERFORMANCEIMPROVEMENTTheperformanceofmanyimportantapplicationsismeasuredintermsoftheexecutionlatencyofindividualtasksinsteadofhighoverallthroughputofmanyessentiallyunrelatedtasks.Mostdesktopprocessorapplicationsstillfallinthiscategory,asusersaregenerallymoreconcernedwiththeircomputersrespondingtotheircommandsasquicklyaspossiblethantheyarewiththeircomputersabilitytohandlemanycommandssimultaneously,althoughthissituationischangingslowlyovertimeasmoreapplicationsarewrittentoincludemanybackgroundtasks.Usersofmanyothercomputationboundapplications,suchasmostsimulationsandcompilations,aretypicallyalsomoreinterestedinhowlongtheprogramstaketoexecutethaninexecutingmanyinparallel.Multiprocessorscanspeedupthesetypesofapplications,butitrequireseffortonthepartofprogrammerstobreakupeachlonglatencythreadofexecutionintoalargenumberofsmallerthreadsthatcanbeexecutedonmanyprocessorsinparallel,sinceautomaticparallelizationtechnologyhastypicallyfunctionedonlyonFortranprogramsdescribingdensematrixnumericalcomputations.Historically,communicationbetweenprocessorswasgenerallyslowinrelationtothespeedofindividualprocessors,soitwascriticalforprogrammerstoensurethatthreadsrunningonseparateprocessorsrequiredonlyminimalcommunicationwitheachother.Becausecommunicationreductionisoftendifficult,onlyasmallminorityofusersbotheredtoinvestthetimeandeffortrequiredtoparallelizetheirprogramsinawaythatcouldachievespeedup,sothesetechniquesweretaughtonlyinadvanced,graduatelevelcomputersciencecourses.Instead,inmostcasesprogrammersfoundthatitwasjusteasiertowaitforthenextgenerationofuniprocessorstoappearandspeeduptheirapplicationsforfreeinsteadofinvestingtheeffortrequiredtoparallelizetheirprograms.Asaresult,multiprocessorshadahardtimecompetingagainstuniprocessorsexceptinverylargesystems,wherethetargetperformancesimplyexceededthepowerofthefastestuniprocessorsavailable.Withtheexhaustionofessentiallyallperformancegainsthatcanbeachievedforfreewithtechnologiessuchassuperscalardispatchandpipelining,wearenowenteringanerawhereprogrammersmustswitchtomoreparallelprogrammingmodelsinordertoexploitmultiprocessorseffectively,iftheydesireimprovedsingleprogramperformance.ThisisbecausethereareonlythreerealdimensionstoprocessorperformanceincreasesbeyondMooreslawclockfrequency,superscalarinstructionissue,andmultiprocessing.Wehavepushedthefirsttwototheirlogicallimitsandmustnowembracemultiprocessing,evenifitmeansthatprogrammerswillbeforcedtochangetoaparallelprogrammingmodeltoachievethehighestpossibleperformance.Conveniently,thetransitionfrommultiplechipsystemstochipmultiprocessorsgreatlysimplifiestheproblemstraditionallyassociatedwithparallelprogramming.Previouslyitwasnecessarytominimizecommunicationbetweenindependentthreadstoanextremelylowlevel,becauseeachcommunicationcouldrequirehundredsoreventhousandsofprocessorcycles.WithinanyCMPwithasharedonchipcachememory,however,eachcommunicationeventtypicallytakesjustahandfulTheFutureofMicroprocessorsMultiprocessorsFOCUSQUEUESeptember200533morequeuewww.acmqueue.comofprocessorcycles.Withlatencieslikethese,communicationdelayshaveamuchsmallerimpactonoverallsystemperformance.Programmersmuststilldividetheirworkintoparallelthreads,butdonotneedtoworrynearlyasmuchaboutensuringthatthesethreadsarehighlyindependent,sincecommunicationisrelativelycheap.Thisisnotacompletepanacea,however,becauseprogrammersmuststillstructuretheirinterthreadsynchronizationcorrectly,ortheprogrammaygenerateincorrectresultsordeadlock,butatleasttheperformanceimpactofcommunicationdelaysisminimized.Parallelthreadscanalsobemuchsmallerandstillbeeffectivethreadsthatareonlyhundredsorafewthousandcycleslongcanoftenbeusedtoextractparallelismwiththesesystems,insteadofthemillionsofcycleslongthreadstypicallynecessarywithconventionalparallelmachines.ResearchershaveshownthatparallelizationofapplicationscanbemadeeveneasierwithseveralschemesinvolvingtheadditionoftransactionalhardwaretoaCMP.15,16,17,18,19Thesesystemsaddbufferinglogicthatletsthreadsattempttoexecuteinparallel,andthendynamicallydetermineswhethertheyareactuallyparallelatruntime.Ifnointerthreaddependenciesaredetectedatruntime,thenthethreadscompletenormally.Ifdependenciesexist,thenthebuffersofsomethreadsareclearedandthosethreadsarerestarted,dynamicallyserializingthethreadsintheprocess.Suchhardware,whichisonlypracticalontightlycoupledparallelmachinessuchasCMPs,eliminatestheneedforprogrammerstodeterminewhetherthreadsareparallelastheyparallelizetheirprogramstheyneedonlychoosepotentiallyparallelthreads.Overall,theshiftfromconventionalprocessorstoCMPsshouldbelesstraumaticforprogrammersthantheshiftfromconventionalprocessorstomultichipmultiprocessors,becauseoftheshortCMPcommunicationlatenciesandenhancementssuchastransactionalmemory,whichshouldbecommerciallyavailablewithinthenextfewyears.Asaresult,thisparadigmshiftshouldbewithintherangeofwhatisfeasiblefortypicalprogrammers,insteadofbeinglimitedtograduatelevelcomputersciencetopics.HARDWAREADVANTAGESInadditiontothesoftwareadvantagesnowandinthefuture,CMPshavemajoradvantagesoverconventionaluniprocessorsforhardwaredesigners.CMPsrequireonlyafairlymodestengineeringeffortforeachgenerationofprocessors.Eachmemberofafamilyofprocessorsjustrequiresthestampingdownofadditionalcopiesofthecoreprocessorandthenmakingsomemodificationstorelativelyslowlogicconnectingtheprocessorstogethertoaccommodatetheadditionalprocessorsineachgenerationandnotacompleteredesignofthehighspeedprocessorcorelogic.Moreover,thesystemboarddesigntypicallyneedsonlyminortweaksfromgenerationtogeneration,sinceexternallyaCMPlooksessentiallythesamefromgenerationtogeneration,evenasthenumberofprocessorswithinitincreases.TheonlyrealdifferenceisthattheboardwillneedtodealwithhigherI/ObandwidthrequirementsastheCMPsscale.Overseveralsiliconprocessgenerations,thesavingsinengineeringcostscanbesignificant,becauseitisrelativelyeasytostampdownafewmorecoreseachtime.Also,thesameengineeringeffortcanbeamortizedacrossalargefamilyofrelatedprocessors.Simplyvaryingthenumbersandclockfrequenciesofprocessorscanallowessentiallythesamehardwaretofunctionatmanydifferentprice/performancepoints.ANINEVITABLETRANSITIONAsaresultofthesetrends,weareatapointwherechipmultiprocessorsaremakingsignificantinroadsintothemarketplace.ThroughputcomputingisthefirstandmostpressingareawhereCMPsarehavinganimpact.Thisisbecausetheycanimprovepower/performanceresultsrightoutofthebox,withoutanysoftwarechanges,thankstothelargenumbersofindependentthreadsthatareavailableinthesealreadymultithreadedapplications.Inthenearfuture,CMPsshouldalsohaveanimpactinthemorecommonareaoflatencycriticalcomputations.Althoughitisnecessarytoparallelizemostlatencycriticalsoftwareintomultipleparallelthreadsofexecutiontoreallytakeadvantageofachipmultiprocessor,CMPsmakethisprocesseasierthanwithconventionalmultiprocessors,becauseoftheirshortinterprocessorcommunicationlatencies.Viewedanotherway,thetransitiontoCMPsisinevitablebecausepasteffortstospeedupprocessorarchitectureswithtechniquesthatdonotmodifythebasicvonNeumanncomputingmodel,suchaspipeliningandsuperscalarissue,areencounteringhardlimits.Asaresult,themicroprocessorindustryisleadingthewaytomulticorearchitectureshowever,thefullbenefitofthesearchitectureswillnotbeharnesseduntilthesoftwareindustryfullyembracesparallelprogramming.Theartofmultiprocessorprogramming,currentlymasteredbyonlyasmallminorityofprogrammers,ismorecomplexthanprogramminguniprocessormachinesandrequiresanunderstandingofnewcomputationalprinciples,algorithms,andprogrammingtools.Q34September2005QUEUErantsfeedbackacmqueue.comREFERENCES1.Moore,G.E.1965.Crammingmorecomponentsontointegratedcircuits.ElectronicsApril114–117.2.Hennessy,J.L.,andPatterson,D.A.2003.ComputerArchitectureAQuantitativeApproach,3rdEdition,SanFrancisco,CAMorganKaufmannPublishers.3.Wall,D.W.1993.LimitsofInstructionLevelParallelism,WRLResearchReport93/6,DigitalWesternResearchLaboratory,PaloAlto,CA.4.Barroso,L.,Dean,J.,andHoezle,U.2003.WebsearchforaplanetthearchitectureoftheGooglecluster.IEEEMicro23222–28.5.Olukotun,K.,Nayfeh,B.A.,Hammond,L.Wilson,K.andChang,K.1996.Thecaseforasinglechipmultiprocessor.Proceedingsofthe7thInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystemsASPLOSVII2–11.6.Kapil,S.2003.UltraSPARCGeminiDualCPUProcessor.InHotChips15August,Stanford,CAhttp//www.hotchips.org/archives/.7.Maruyama,T.2003.SPARC64VIFujitsusnextgenerationprocessor.InMicroprocessorForumOctober,SanJose,CA.8.McNairy,C.,andBhatia,R.2004.MontecitothenextproductintheItaniumprocessorfamily.InHotChips16August,Stanford,CAhttp//www.hotchips.org/archives/.9.Moore,C.2000.POWER4systemmicroarchitecture.InMicroprocessorForumOctober,SanJose,CA.10.Barroso,L.A.,Gharachorloo,K.,McNamara,R.,Nowatzyk,A.,Qadeer,S.,Sano,B.,Smith,S.,Stets,R.,andVerghese,B.2000.Piranhaascalablearchitecturebasedonsinglechipmultiprocessing.InProceedingsofthe27thInternationalSymposiumonComputerArchitectureJune282–293.11.Kongetira,P.,Aingaran,K.,andOlukotun,K.2005.Niagaraa32waymultithreadedSPARCprocessor.IEEEMicro25221–29.12.Alverson,R.,Callahan,D.,Cummings,D.,Koblenz,B.,Porterfield,A.,andSmith,B.1990.TheTeracomputersystem.InProceedingsofthe1990InternationalConferenceonSupercomputingJune1–6.13.Laudon,J.,Gupta,A.,andHorowitz,M.1994.Interleavingamultithreadingtechniquetargetingmultiprocessorsandworkstations.Proceedingsofthe6thInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems308–316.14.Tullsen,D.M.,Eggers,S.J.,andLevy,H.M.1995.Simultaneousmultithreadingmaximizingonchipparallelism.InProceedingsofthe22ndInternationalSymposiumonComputerArchitectureJune392–403.15.Hammond,L.,Carlstrom,B.D.,Wong,V.,Chen,M.,Kozyrakis,C.,andOlukotun,K.2004.Transactionalcoherenceandconsistencysimplifyingparallelhardwareandsoftware.IEEEMicro24692–103.16.Hammond,L.,Hubbert,B.,Siu,M.,Prabhu,M.,Chen,M.,andOlukotun,K.2000.TheStanfordHydraCMP.IEEEMicro20271–84.17.Krishnan,V.,andTorrellas,J.1999.Achipmultiprocessorarchitecturewithspeculativemultithreading.IEEETransactionsonComputers489866–880.18.Sohi,G.,Breach,S.,andVijaykumar,T.1995.Multiscalarprocessors.InProceedingsofthe22ndInternationalSymposiumonComputerArchitectureJune414–425.19.Steffan,J.G.,andMowry,T.1998.Thepotentialforusingthreadleveldataspeculationtofacilitateautomaticparallelization.InProceedingsofthe4thInternationalSymposiumonHighPerformanceComputerArchitectureFebruary2–13.LOVEIT,HATEITLETUSKNOWfeedbackacmqueue.comorwww.acmqueue.com/forumsKUNLEOLUKOTUNisanassociateprofessorofelectricalengineeringandcomputerscienceatStanfordUniversity,whereheledtheStanfordHydrasinglechipmultiprocessorresearchproject,whichpioneeredmultipleprocessorsonasinglesiliconchip.HefoundedAfaraWebsystemstodevelopcommercialserversystemswithchipmultiprocessortechnology.AfarawasacquiredbySunMicrosystems,andtheAfaramicroprocessortechnologyisnowcalledNiagara.Olukotunisinvolvedinresearchincomputerarchitecture,parallelprogrammingenvironments,andscalableparallelsystems.LANCEHAMMONDisapostdoctoralfellowatStanfordUniversity.AsaPh.D.student,HammondwastheleadarchitectandimplementeroftheHydrachipmultiprocessor.ThegoalofHammondsrecentworkontransactionalcoherenceandconsistencyistomakeparallelprogrammingaccessibletotheaverageprogrammer.©2005ACM15427730/05/09005.00TheFutureofMicroprocessorsMultiprocessorsFOCUS
编号:201401051948506827    大小:329.70KB    格式:PDF    上传时间:2014-01-05
  【编辑】
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
0条评论

还可以输入200字符

暂无评论,赶快抢占沙发吧。

当前资源信息

4.0
 
(2人评价)
浏览:11次
baixue100上传于2014-01-05

官方联系方式

客服手机:13961746681   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

相关资源

相关资源

相关搜索

工业、机械、能源、设计、建模、模具、工学  
关于我们 - 网站声明 - 网站地图 - 友情链接 - 网站客服客服 - 联系我们
copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5