49-The Stanford DASH Multiprocessor.pdf49-The Stanford DASH Multiprocessor.pdf

收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

THESTANFORDDASHMULTIPROCESSORDANIELLENOSKI,JAMESLAUDON,KOUROSHGHARACHORLOO,WOLFDIETRICHWEBER,ANOOPGUPTA,JOHNHENNESSY,MARKHOROWITZ,ANDMONICASLAMSTANFORDUNIVERSITYDIRECTORYBASEDCACHECOHERENCEGIVESDASHTHEEASEOFUSEOFSHAREDMEMORYARCHITECTURESWHILEMAINTAININGTHESCALABILITYOFMESSAGEPASSINGMACHINESHECOMPUTERSYSTEMSLABORATORYATSTANFORDUNIVERSITYISDEVELOPINGASHAREDMEMORYMULTIPROCESSORCALLEDDASHANABBREVIATIONFORDIRECTORYARCHITECTUREFORSHAREDMEMORYTHEFUNDAMENTALPREMISEBEHINDTHEARCHITECTUREISTHATITISPOSSIBLETOBUILDASCALABLEHIGHPERFORMANCEMACHINEWITHASINGLEADDRESSSPACEANDCOHERENTCACHESTHEDASHARCHITECTUREISSCALABLEINTHATITACHIEVESLINEARORNEARLINEARPERFORMANCEGROWTHASTHENUMBEROFPROCESSORSINCREASESFROMAFEWTOAFEWTHOUSANDTHISPERFORMANCERESULTSFROMDISTRIBUTINGTHEMEMORYAMONGPROCESSINGNODESANDUSINGANETWORKWITHSCALABLEBANDWIDTHTOCONNECTTHENODESTHEARCHITECTUREALLOWSSHAREDDATATOBECACHED,THEREBYSIGNIFICANTLYREDUCINGTHELATENCYOFMEMORYACCESSESANDYIELDINGHIGHERPROCESSORUTILIZATIONANDHIGHEROVERALLPERFORMANCEADISTRIBUTEDDIRECTORYBASEDPROTOCOLPROVIDESCACHECOHERENCEWITHOUTCOMPROMISINGSCALABILITYTHEDASHPROTOTYPESYSTEMISTHEFIRSTOPERATIONALMACHINETOINCLUDEASCALABLECACHECOHERENCEMECHANISMTHEPROTOTYPEINCORPORATESUPTO64HIGHPERFORMANCERISCMICROPROCESSORSTOYIELDPERFORMANCEUPTO16BILLIONINSTRUCTIONSPERSECONDAND600MILLIONSCALARFLOATINGPOINTOPERATIONSPERSECONDTHEDESIGNOFTHEPROTOTYPEHASPROVIDEDDEEPERINSIGHTINTOTHEARCHITECTURALANDIMPLEMENTATIONCHALLENGESTHATARISEINALARGESCALEMACHINEWITHASINGLEADDRESSSPACETHEPROTOTYPEWILLALSOSERVEASAPLATFORMFORSTUDYINGREALAPPLICATIONSANDSOFTWAREONALARGEPARALLELSYSTEMTHISARTICLEBEGINSBYDESCRIBINGTHEOVERALLGOALSFORDASH,THEMAJORFEATURESOFTHEARCHITECTURE,ANDTHEMETHODSFORACHIEVINGSCALABILITYNEXT,WEDESCRIBETHEDIRECTORYBASEDCOHERENCEPROTOCOLINDETAILWETHENPROVIDEANOVERVIEWOFTHEPROTOTYPEMACHINEANDTHECORRESPONDINGSOFTWARESUPPORT,FOLLOWEDBYSOMEMARCH1992PRELIMINARYPERFORMANCENUMBERSTHEARTICLECONCLUDESWITHADISCUSSIONOFRELATEDWORKANDTHECURRENTSTATUSOFTHEDASHHARDWAREANDSOFTWARETHEDASHTEAMMANYGRADUATESTUDENTSANDFACULTVMEMBERSCONTRIBUTEDTOTHEDASHPROJECTTHEPHDSTUDENTSAREDANIELLENOSKIANDJAMESLAUDASHPROJECTOVERVIEWTHEOVERALLGOALOFTHEDASHPROJECTISTOINVESTIGATEHIGHLYPARALLELARCHITECTURESFORTHESEARCHITECTURESTOACHIEVEWIDESPREADUSE,THEYMUSTRUNAVARIETYOFAPPLICATIONSEFFICIENTLYWITHOUTIMPOSINGEXCESSIVEPROGRAMMINGDIFFICULTYTOACHIEVEBOTHHIGHPERFORMANCEANDWIDEAPPLICABILITY,WEBELIEVEAPARALLELARCHITECTUREMUSTPROVIDESCALABILITYTOSUPPORTHUNDREDSTOTHOUSANDSOFPROCESSORSHIGHPERFORMANCEINDIVIDUALPROCESSORS,ANDASINGLESHAREDADDRESSSPACETHEGAPBETWEENTHECOMPUTINGPOWEROFMICROPROCESSORSANDTHATOFTHELARGESTSUPERCOMPUTERSISSHRINKINGWHILETHEPRICEIPERFORMANCEADVANTAGEOFMICROPROCESSORSISINCREASINGTHISCLEARLYPOINTSTOUSINGMICROPROCESSORSASTHECOMPUTEENGINESINAMULTIPROCESSORTHECHALLENGELIESINBUILDINGAMACHINETHATCANSCALEUPITSPERFORMANCEWHILEMAINTAININGTHEINITIALPRICE/PERFORMANCEADVANTAGEOFTHEINDIVIDUALPROCESSORSSCALABILITYALLOWSAPARALLELARCHITECTURETOLEVERAGECOMMODITYMICROPROCESSORSANDSMALLSCALEMULTIPROCESSORSTOBUILDLARGERSCALEMACHINESTHESELARGERMACHINESOFFERSUBSTANTIALLYHIGHERPERFORMANCE,WHICHPROVIDESTHEIMPETUSFORPROGRAMMERSTOPORTTHEIRSEQUENTIALAPPLICATIONSTOPARALLELARCHITECTURESINSTEADOFWAITINGFORTHENEXTHIGHERPERFORMANCEUNIPROCESSORHIGHPERFORMANCEPROCESSORSAREIMPORTANTTOACHIEVEBOTHHIGHTOTALSYSTEMPERFORMANCEANDGENERALAPPLICABILITYUSINGTHEFASTESTMICROPROCESSORSREDUCESTHEIMPACTOFLIMITEDORUNEVENPARALLELISMINHERENTINSOMEAPPLICATIONSITALSOALLOWSAWIDERSETOFAPPLICATIONSTOEXHIBITACCEPTABLEPERFORMANCEWITHLESSEFFORTFROMTHEPROGRAMMERASINGLEADDRESSSPACEENHANCESTHEPROGRAMMABILITYOFAPARALLELMACHINEBYREDUCINGTHEPROBLEMSOFDATAPARTITIONINGANDDYNAMICLOADDISTRIBUTION,TWOOFTHETOUGHESTPROBLEMSINPROGRAMMINGPARALLELMACHINESTHESHAREDADDRESSSPACEALSOIMPROVESSUPPORTFORAUTOMATICALLYPARALLELIZINGCOMPILERS,STANDARDOPERATINGSYSTEMS,MULTIPRODONDASHARCHITECTUREANDHARDWAREDESIGN;KOUROSHGHARACHORLOODASHARCHITECTUREANDCONSISTENCYMODELS;WOLFDIETRICHWEBERDASHSIMULATORANDSCALABLEDIRECTORIES;TRUMANJOEDASHHARDWAREANDPROTOCOLVERIFICATIONTOOLS;LUISSTEVENSOPERATINGSYSTEM;HELENDAVISANDSTEPHENGOLDSCHMIDTTRACEGENERATIONTOOLS,SYNCHRONIZATIONPATTERNS,LOCALITYSTUDIES;TODDMOWRYEVALUATIONOFPREFETCHOPERATIONS;AARONGOLDBERGANDMARGARETMARTONOSIPERFORMANCEDEBUGGINGTOOLS;TOMCHANAKMESHROUTINGCHIPDESIGN;RICHARDSIMONISYNTHETICLOADGENERATORANDDIRECTORYSTUDIES;JOSEPTORRELLASSHARINGPATTERNSINAPPLICATIONS;EDWARDROTHBERG,JASWINDERPALSINGH,ANDLARRYSOULEAPPLICATIONSANDALGORITHMDEVELOPMENTSTAFFRESEARCHENGINEERDAVIDNAKAHIRACONTRIBUTEDTOTHEHARDWAREDESIGNTHEFACULTYASSOCIATEDWITHTHEPROJECTAREANOOPGUPTA,JOHNHENNESSY,MARKHOROWITZ,ANDMONICALAMGRAMMING,ANDINCREMENTALTUNINGOFPARALLELAPPLICATIONSFEATURESTHATMAKEASINGLEADDRESSSPACEMACHINEMUCHEASIERTOUSETHANAMESSAGEPASSINGMACHINECACHINGOFMEMORY,INCLUDINGSHAREDWRITABLEDATA,ALLOWSMULTIPROCESSORSWITHASINGLEADDRESSSPACETOACHIEVEHIGHPERFORMANCETHROUGHREDUCEDMEMORYLATENCYUNFORTUNATELY,CACHINGSHAREDDATAINTRODUCESTHEPROBLEMOFCACHECOHERENCESEETHESIDEBARANDACCOMPANYINGFIGUREWHILEHARDWARESUPPORTFORCACHECOHERENCEHASITSCOSTS,ITALSOOFFERSMANYBENEFITSWITHOUTHARDWARESUPPORT,THERESPONSIBILITYFORCOHERENCEFALLSTOTHEUSERORTHECOMPILEREXPOSINGTHEISSUEOFCOHERENCETOTHEUSERWOULDLEADTOACOMPLEXPROGRAMMINGMODEL,WHEREUSERSMIGHTWELLAVOIDCACHINGTOEASETHEPROGRAMMINGBURDENHANDLINGTHECOHERENCEPROBLEMINTHECOMPILERISATTRACTIVEBUTCURRENTLYCANNOTBEDONEINAWAYTHATISCOMPETITIVEWITHHARDWAREWITHHARDWARESUPPORTEDCACHECOHERENCE,THECOMPILERCANAGGRESSIVELYOPTIMIZEPROGRAMSTOREDUCELATENCYWITHOUTHAVINGTORELYPURELYONACONSERVATIVESTATICDEPENDENCEANALYSISTHEMAJORPROBLEMWITHEXISTINGCACHECOHERENTSHAREDADDRESSMACHINESISTHATTHEYHAVENOTDEMONSTRATEDTHEABILITYTOSCALEEFFECTIVELYBEYONDAFEWHIGHPERFORMANCEPROCESSORSTODATE,ONLYMESSAGEPASSINGMACHINESHAVESHOWNTHISABILITYWEBELIEVETHATUSINGADIRECTORYBASEDCOHERENCEMECHANISMWILLPERMITSINGLEADDRESSSPACEMACHINESTOSCALEASWELLASMESSAGEPASSINGMACHINES,WHILEPROVIDINGAMOREFLEXIBLEANDGENERALPROGRAMMINGMODELDASHSYSTEMORGANIZATIONMOSTEXISTINGMULTIPROCESSORSWITHCACHECOHERENCERELYONSNOOPINGTOMAINTAINCOHERENCEUNFORTUNATELY,SNOOPINGSCHEMESDISTRIBUTETHEINFORMATIONABOUTWHICHPROCESSORSARECACHINGWHICHDATAITEMSAMONGTHECACHESTHUS,STRAIGHTFORWARDSNOOPINGSCHEMESREQUIRETHATALLCACHESSEEEVERYMEMORYREQUESTFROMEVERYPROCESSORTHISINHERENTLYLIMITSTHESCALABILITYOFTHESEMACHINESBECAUSETHECOMMONBUSANDTHEINDIVIDUALPROCESSORCACHESEVENTUALLYSATURATEWITHTODAY’SHIGHPERFORMANCERISCPROCESSORSTHISSATURATIONCANOCCURWITHJUSTAFEWPROCESSORSDIRECTORYSTRUCTURESAVOIDTHESCALABILITYPROBLEMSOFSNOOPYSCHEMESBYREMOVINGTHENEEDTOBROADCASTEVERYMEMORYREQUESTTOALLPROCESSORCACHESTHEDIRECTORYMAINTAINSPOINTERSTOTHEPROCESSORCACHESHOLDINGACOPYOFEACHMEMORYBLOCKONLYTHECACHESWITHCOPIESCANBEAFFECTEDBYANACCESSTOTHEMEMORYBLOCK,ANDONLYTHOSECACHESNEEDBENOTIFIEDOFTHEACCESSTHUS,THEPROCESSORCACHESANDINTERCONNECTWILLNOTSATURATEDUETOCOHERENCEREQUESTSFURTHERMOREDIRECTORYBASEDCOHERENCEISNOTDEPENDENTONANYSPECIFICINTERCONNECTIONNETWORKLIKETHEBUSUSEDBYMOSTSNOOPINGSCHEMESTHESAMESCALABLE,LOWLATENCYNETWORKSSUCHASOMEGANETWORKSORKNARYNCUBESUSEDBYNONCACHECOHERENTAND64COMPUTERCACHECOHERENCECACHECOHERENCEPROBLEMSCANARISEINSHAREDMEMORYMULTIPROCESSORSWHENMORETHANONEPROCESSORCACHEHOLDSACOPYOFADATAITEMAUPONAWRITE,THESECOPIESMUSTBEUPDATEDORINVALIDATEDBMOSTSYSTEMSUSEINVALIDATIONSINCETHISALLOWSTHEWRITINGPROCESSORTOGAINEXCLUSIVEACCESSTOTHECACHELINEANDCOMPLETEFURTHERWRITESINTOTHECACHELINEWITHOUTGENERATINGEXTERNALTRAFFICCTHISFUFTHERCOMPLICATESCOHERENCESINCETHISDIRTYCACHEMUSTRESPONDINSTEADOFMEMORYONSUBSEQUENTACCESSESBYOTHERPROCESSORSDSMALLSCALEMULTIPROCESSORSFREQUENTLYUSEASNOOPYCACHECOHERENCEPROTOCOL,WHICHRELIESONALLCACHESMONITORINGTHECOMMONBUSTHATCONNECTSTHEPROCESSORSTOMEMORYTHISMONITORINGALLOWSCACHESTOINDEPENDENTLYDETERMINEWHENTOINVALIDATECACHELINESB,ANDWHENTOINTERVENEBECAUSETHEYCONTAINTHEMOSTUPTODATECOPYOFAGIVENLOCATIONDSNOOPYSCHEMESDONOTSCALETOALARGENUMBEROFPROCESSORSBECAUSETHECOMMONBUSORINDIVIDUALPROCESSORCACHESEVENTUALLYSATURATE,SINCETHEYMUSTPROCESSEVERYMEMORYREQUESTFROMEVERYPROCESSORONMEMORYREQUESTSBYKEEPINGTRACKOFWHICHCACHESHOLDEACHMEMORYBLOCKASIMPLEDIRECTORYSTRUCTUREFIRSTPROPOSEDBYCENSIERANDFEAUTRIERHASONEDIRECTORYENTRYPERBLOCKOFMEMORYEEACHENTRYCONTAINSONEPRESENCEBITPERPROCESSORCACHEINADDITION,ASTATEBITINDICATESWHETHERTHEBLOCKISUNCACHED,SHAREDINMULTIPLECACHES,ORHELDEXCLUSIVELYBYONECACHETHATIS,WHETHERTHEBLOCKISDIRTYUSINGTHESTATEANDPRESENCEBITS,THEMEMORYCANTELLWHICHCACHESNEEDTOBEINVALIDATEDWHENALOCATIONISWRITTENBLIKEWISE,THEDIRECTORYINDICATESWHETHERMEMORYSCOPYOFTHEBLOCKISUPTODATEORWHICHCACHEHOLDSTHEMOSTRECENTCOPYDIFTHEMEMORYANDDIRECTORYAREPARTITIONEDINTOINDEPENDENTUNITSANDCONNECTEDTOTHEPROCESSORSBYASCALABLEINTERCONNECT,THEMEMORYSYSTEMCANPROVIDESCALABLEMEMORYBANDWIDTHTHEDIRECTORYRELIEVESTHEPROCESSORCACHESFROMSNOOPINGREFERENCES1JARCHIBALDANDJLBAER,CACHECOHERENCEPROTOCOLSEVALUATIONUSINGAMULTIPROCESSORSIMULATIONMODEL,ACMTRANSCOMPUTERSYSTEMS,VOL4,NO4,NOV1986,PP2732982LCENSIERANDPFEAUTRIER,ANEWSOLUTIONTOCOHERENCEPROBLEMSINMULTICACHESYSTEMS,/E€€TRANSCOMPUTERS,VOLC27,NO12,DEC1978,PP1,1121,118STORE3,ACACHECACHE4LOADAERDDATASTATEBITPRESENCEBITSEMESSAGEPASSINGMACHINESCANBEEMPLOYEDTHECONCEPTOFDIRECTORYBASEDCACHECOHERENCEISNOTNEWITWASFIRSTPROPOSEDINTHELATE1970SHOWEVER,THEORIGINALDIRECTORYSTRUCTURESWERENOTSCALABLEBECAUSETHEYUSEDACENTRALIZEDDIRECTORYTHATQUICKLYBECAMEABOTTLENECKTHEDASHARCHITECTUREOVERCOMESTHISLIMITATIONBYPARTITIONINGANDDISTRIBUTINGTHEDIRECTORYANDMAINMEMORY,ANDBYUSINGANEWCOHERENCEPROTOCOLTHATCANSUITABLYEXPLOITDISTRIBUTEDDIRECTORIESINADDITION,DASHPROVIDESSEVERALOTHERMECHANISMSTOMARCH199265REDUCEANDHIDETHELATENCYOFMEMORYOPERATIONSFIGURE1SHOWSDASH’SHIGHLEVELORGANIZATIONTHEARCHITECTURECONSISTSOFANUMBEROFPROCESSINGNODESCONNECTEDTHROUGHDIRECTORYCONTROLLERSTOALOWLATENCYINTERCONNECTIONNETWORKEACHPROCESSINGNODE,ORCLUSTER,CONSISTSOFASMALLNUMBEROFHIGHPERFORMANCEPROCESSORSANDAPORTIONOFTHESHAREDMEMORYINTERCONNECTEDBYABUSMULTIPROCESSINGWITHINTHECLUSTERCANBEVIEWEDEITHERASINCREASINGTHEPOWEROFEACHPROCESSINGNODEORASREDUCINGTHECOSTOFTHEDIRECTORYANDNETWORKINTERFACEBYAMORTIZINGITOVERALARGERNUMBEROFPROCESSORSDISTRIBUTINGMEMORYWITHTHEPROC
编号:201401051948386821    类型:共享资源    大小:2.35MB    格式:PDF    上传时间:2014-01-05
  
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:49-The Stanford DASH Multiprocessor.pdf
链接地址:http://www.renrendoc.com/p-256821.html

当前资源信息

4.0
 
(2人评价)
浏览:32次
baixue100上传于2014-01-05

官方联系方式

客服手机:17625900360   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

精品推荐

相关阅读

人人文库
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5