32-Using Cache Memory to Reduce Processor-Memory Traffic.pdf32-Using Cache Memory to Reduce Processor-Memory Traffic.pdf

收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

USINGCACHEMEMORYTOREDUCEPROCESSORMEMORYTRAFFICJAMESRGOODMANDEPARTMENTOFCOMPUTERSCIENCESUNIVERSITYOFWISCONSINMADISONMADISON,WI58706ABSTRACTTHEIMPORTANCEOFREDUCINGPROCESSORMEMORYBANDWIDTHISRECOGNIZEDINTWODISTINCTSITUATIONSSINGLEBOARDCOMPUTERSYSTEMSANDMICROPROCESSORSOFTHEFUTURECACHEMEMORYISINVESTIGATEDASAWAYTOREDUCETHEMEMORYPROCESSORTRAFFICWESHOWTHATTRADITIONALCACHESWHICHDEPENDHEAVILYONSPATIALLOCALITYLOOKAHEADFORTHEIRPERFORMANCEAREINAPPROPRIATEINTHESEENVIRONMENTSBECAUSETHEYGENERATELARGEBURSTSOFBUSTRAFFICACACHEEXPLOITINGPRIMARILYTEMPORALLOCALITYLOOKBEHINDISTHENPROPOSEDANDDEMONSTRATEDTOBEEFFECTIVEINANENVIRONMENTWHEREPROCESSSWITCHESAREINFREQUENTWEARGUETHATSUCHANENVIRONMENTISPOSSIBLEIFTHETRAFFICTOBACKINGSTOREISSMALLENOUGHTHATMANYPROCESSORSCANSHAREACOMMONMEMORYANDIFTHECACHEDATACONSISTENCYPROBLEMISSOLVEDWEDEMONSTRATETHATSUCHACACHECANINDEEDREDUCETRAFFICTOMEMORYGREATLY,ANDINTRODUCEERELEGANTSOLUTIONTOTHECACHECOHERENCYPROBLEM1INTRODUCTIONBECAUSETHEREARESTRAIGHTFORWARDWAYSTOCONSTRUCTPOWERFUL,COSTEFFECTIVESYSTEMSUSINGRANDOMACCESSMEMORIESANDSINGLECHIPMICROPROCESSORS,SEMICONDUCTORTECHNOLOGYHAS,UNTILNOW,HADTHEGREATESTIMPACTTHROUGHTHESECOMPONENTSHIGHPERFORMANCEPROCESSORS,HOWEVER,ARESTILLBEYONDTHECAPABILITYOFASINGLECHIPIMPLEMENTATIONANDARENOTEASILYPARTITIONEDINAWAYWHICHCANEFFECTIVELYEXPLOITTHETECHNOLOGYANDECONOMIESOFVLSIANINTERESTINGPHENOMENONHASOCCURREDINTHEPREVIOUSDECADEASARESULTOFTHISDISPARITYMEMORYCOSTSHAVEDROPPEDRADICALLYANDCONSISTENTLYFORCOMPUTERSYSTEMSOFALLSIZESWHILETHECOMPONENTCOSTOFACPUSINGLECHIPIMPLEMENTATIONSEXCLUDEDHASDECLINEDSIGNIFICANTLYOVERTHESAMEPERIOD,THEREDUCTIONHASBEENLESSDRAMATICARESULTISTHATTHEAMOUNTOFMEMORYTHOUGHTTOHEAPPROPRIATEFORAGIVENSPEEDPROCESSORHASGROWNDRAMATICALLYINRECENTYEARSTODAYSMALLMINICOMPUTERSHAVEMEMORYASLARGEASTHATOFTHEMOSTEXPENSIVEMACHINESOFADECADEAGOPERMISSIONTOCOPYWITHOUTFEEALLORPARTOFTHISMATERIALISGRANTEDPROVIDEDTHATTHECOPIESARENOTMADEORDISTRIBUTEDFORDIRECTCOMMERCIALADVANTAGE,THEACMCOPYRIGHTNOTICEANDTHETITLEOFTHEPUBLICATIONANDITSDATEAPPEAR,ANDNOTICEISGIVENTHATCOPYINGISBYPERMISSIONOFTHEASSOCIATIONFORCOMPUTINGMACHINERYTOCOPYOTHERWISE,ORTOREPUBLISH,REQUIRESAFEEAND/ORSPECIFICPERMISSIONTHEIMPACTOFVLSIHASBEENVERYDIFFERENTINMICROPROCESSORAPPLICATIONSHEREMEMORYISSTILLREGARDEDASANEXPENSIVECOMPONENTINTHESYSTEM,ANDTHOSEFAMILIARPRIMARILYWITHAMINICOMPUTERORMAINFRAMEENVIRONMENTAREOFTENSCORNFULOFTHETROUBLETOWHICHMICROPROCESSORUSERSGOTOCONSERVEMEMORYTHEREASON,OFCOURSE,ISTHATEVENTHESMALLMEMORYINAMICROPROCESSORISAMUCHLARGERPORTIONOFTHETOTALSYSTEMCOSTTHANTHEMUCHLARGERMEMORYONATYPICALMAINFRAMESYSTEMTHISRESULTSFROMTHEFACTTHATMEMORYANDPROCESSORSAREIMPLEMENTEDINTHESAMETECHNOLOGY11ASUPERCPUWITHTHEADVANCESTOVLSIOCCURRINGNOWANDCONTINUINGOVERTHENEXTFEWYEARS,ITWILLBECOMEPOSSIBLETOFABRICATECIRCUITSTHATAREONETOTWOORDERSOFMAGNITUDEMORECOMPLEXTHANCURRENTLYAVAILABLEMICROPROCESSORSITWILLSOONBEPOSSIBLETOFABRICATEANEXTREMELYHIGHPERFORMANCECPUONASINGLECHIP,IFTHEENTIRECHIPISDEVOTEDTOTHECPU,HOWEVER,ITISNOTAGOODIDEAEXTRAPOLATINGHISTORICALTRENDSTOPREDICTFUTURECOMPONENTDENSITIES,WEMIGHTEXPECTTHATWITHINAFEWYEARSWESHOULDBEABLETOPURCHASEASINGLECHIPPROCESSORCONTAININGATLEASTTENTIMESASMANYTRANSISTORSASOCCURIN,SAY,THEMC68000FORTHEEMPIRICALRULEKNOWNASGROSCHSLAW\GROSCH53\,PKCG,WHEREPISSOMEMEASUREOFPERFORMANCE,CISTHECOST,ANDKANDGARECONSTANTS,KNIGHT\KNIGHT66\CONCLUDEDTHATGISATLEAST2,ANDSOLOMON\SOLOMON66\HASSUGGESTEDTHATG147FORTHEIBMSYSTEM/B70FAMILY,SIEWIOREKDETERMINEDTHATGAL8\SIEWIOREK82\WHILEGROSCHSLAWBREAKSDOWNINTHECOMPARISONOFPROCESSORSUSINGDIFFERENTTECHNOLOGYORARCHITECTURES,ITISREALISTICFORPREDICTINGIMPROVEMENTSWITHINASINGLETECHNOLOGYSIEWIOREKINFACTSUGGESTSTHATITHOLDSBYDEFINITIONASSUMINGG15ANDUSINGPROCESSORMEMORYBANDWIDTHASOURMEASUREOFPERFORMANCE,GROSCHSLAWPREDICTSTHATAPROCESSORCONTAININGI0TIMESASMANYTRANSISTORSASACURRENTMICROPROCESSORWOULDREQUIRE30TIMESTHEMEMORYBANDWIDTHTHEMOTOROLAMC68000,RUNNINGAT10MHZ,ACCESSESDATAFROMMEMORYATAMAXIMUMRATEOF5MILLIONBYTESPERSECOND,USINGMORETHANHALFITSPINSTOACHIEVETHISRATEALTHOUGHPACKAGINGTECHNOLOGYISRAPIDLYINCREASINGTHEPINSAVAILABLETOACHIP,ITISUNLIKELYTHATTHEINCREASEWILLHE30FOLDTHE68000HAS84PINSWEWOULDSUGGESTAFACTOROFTWOISREALISTICALTHOUGHSOMETECHNIQUESARECLEARLYPOSSIBLETOINCREASETHETRANSFERRATEINTOANDOUTOFTHE68000,SUPPLYINGSUCHAPROCESSORWITHDATAASFASTASNEEDEDISASEVERECONSTRAINTONEOFTHEDESIGNERSOFTHE88000,HASSTATEDTHATALLMODERNMICROPROCESSORSTHE680001THISISACONSERVATIVEESTIMATE,INFACT,BECAUSEITIGNORESPREDICTABLEDECREASESINATEDELAYS1983ACM01497111/83/0600/012450100124INCLUDEDAREALREADYBUSLIMITED\TREDENNICKB2\12ONCHIPMEMORYONEALTERNATIVEFORINCREASEDPERFORMANCEWITHOUTPROPORTIONATELYINCREASINGPROCESSORMEMORYBANDWIDTHISTOINTRODUCEMEMORYONTHESAMECHIPWITHTHECPUWITHTHEABILITYTOFABRICATECHIPSCONTAININGONETOTWOMILLIONTRANSISTORS,ITSHOULDBEPOSSIBLEUSINGONLYAPORTIONOFTHECHIPTOBUILDAPROCESSORSIGNIFICANTLYMOREPOWERFULTHANANYCURRENTLYAVAILABLESINGLECHIPCPUWHILEDEVOTINGTHEENTIRECHIPTOTHECPUCOULDRESULTINASTILLMOREPOWERFULPROCESSOR,INTRODUCINGONCHIPMEMORYOFFERSAREDUCTIONINMEMORYACCESSTIMEDUETOTHEINHERENTLYSMALLERDELAYSASCOMPAREDTOINTERCHIPDATATRANSFERSIFMOSTACCESSESWEREONCHIP,ITMIGHTACTUALLYPERFORMASFASTASTHEMOREPOWERFULPROCESSORIDEALLY,THECHIPSHOULDCONTAINASMUCHMEMORYASTHEPROCESSORNEEDSFORMAINSTORAGECONVENTIONALWISDOMTODAYSAYSTHATAPROCESSOROFTHESPEEDOFCURRENTMICROPROCESSORSNEEDSATLEAST1/MEGABYTESOFMEMORY\LINDSAYB1\THISISCERTAINLYMORETHANISFEASIBLEONCHIP,THOUGHAHIGHPERFORMANCEPROCESSORCOULDPROBABLYUSESUBSTANTIALLYMORETHANTHATCLEARLYALLTHEPRIMARYMEMORYFORTHEPROCESSORCANNOTBEPLACEDONTHESAMECHIPWITHAPOWERFULCPUWHATISNEEDEDISTHETOPELEMENTOFAMEMORYHIERARCHY,13CACHEMEMORYTHEUSEOFCACHEMEMORY,HOWEVER,HASOFTENAGGRAVATEDTHEBANDWIDTHPROBLEMRATHERTHANREDUCEITSMITH\SMITHB2\SAYSTHATOPTIMIZINGTHEDESIGNHASFOURGENERALASPECTS1MAXIMIZINGTHEHITRATIO,2MINIMIZINGTHEACCESSTIMETODATAINTHECACHE,3MINIMIZINGTHEDELAYDUETOAMISS,AND4MINIMIZINGTHEOVERHEADSOFUPDATINGMAINMEMORY,MAINTAININGMULTICACHECONSISTENCY,ETCTHERESULTISOFTENALARGERBURSTBANDWIDTHREQUIREMENTFROMMAINSTORAGETOTHECACHETHANWOULDBENECESSARYWITHOUTACACHEFOREXAMPLE,THECACHEONTHEIBMSYSTEM/370MODEL16B,ISCAPABLEOFRECEIVINGDATAFROMMAINMEMORYATARATEOF100MEGABYTESPERSECOND\IBM76\,ITSUPPLIESDATATOTHECPUATLESSTHAN1/3THATRATETHEREASONISTHATTOEXPLOITTHESPATIALLOCALITYINMEMORYREFERENCES,THEDATATRANSFERREDFROMBACKINGSTOREINTOTHECACHEISFETCHEDINLARGEBLOCKS,RESULTINGINREQUIREMENTSOFVERYHIGHBANDWIDTHBURSTSOFDATAWEHAVEMEASUREDTHEAVERAGEBANDWIDTHONANIBMSYSTEM/370MODEL155,ANDCONCLUDEDTHATTHEFLYERAGEBACKINGSTORETOCACHETRAFFICISLESSTHANTHECAEHETOCPUTRAFFICTHEDESIGNOFCACHEMEMORYFORMINICOMPUTERSDEMANDEDGREATERCONCERNFORBUSBANDWIDTHTHEDESIGNERSOFTHEPDP11MODELS80AND70CLEARLYRECOGNIZEDTHATSMALLBLOCKSIZESWERENECESSARYTOKEEPMAINMEMORYTRAFFICTOAMINIMUM\BELL78\LOWERINGTHEBANDWIDTHFROMBACKINGSTORETOTHECACHECANBEACCOMPLISHEDINONEOFTWOWAYS1SMALLBLOCKSOFDATAAREBROUGHTFROMBACKINGSTORETOTHECACHE,OR2LONGDELAYSOCCURWHILEABLOCKISBEINGBROUGHTIN,INDEPENDENTOFANDINADDITIONTOTHEACCESSTIMEOFTHEBACKINGSTOREWHILEITISPOSSIBLETOBRINGINTHEWORDREQUESTEDINITIALLYREADTHROUGH,THUSREDUCINGTHEWAITONAGIVENREFERENCE,THELOWBANDWIDTHMEMORYINTERFACEWILLREMAINBUSYLONGAFTERTHEINITIALTRANSFERISCOMPLETED,RESULTINGINLONGDELAYSIFASECONDBACKINGSTORAGEOPERATIONISREQUIREDWETHEREFOREHAVEEXPLOREDTHEEFFECTIVENESSOFACACHEWHICHEXPLOITSPRIMARILYOREXCLUSIVELYTEMPORALLOCALITY,IE,THEBLOCKSFETCHEDFROMBACKINGSTOREAREONLYTHESIZENEEDEDBYTHECPUORPOSSIBLYSLIGHTLYLARGERINCONSIDERINGWAYSTOEVALUATETHISSTRATEGY,WEIDENTIFIEDACOMMERCIALENVIRONMENTTHATCONTAINEDMANYOFTHESAMECONSTRAINTSANDSEEMEDAMENABLETOTHESAMEKINDSOFSOLUTIONSTHISENVIRONMENTISTHEMARKETPLACEOFTHESINGLEBOARDCOMPUTERRUNNINGONASTANDARDBUSSUCHASMULTIBUSORVERSABUSSWEHAVECHOSENTOSTUDYTHISENVIRONMENTINANATTEMPTTOGAININSIGHTINTOTHEORIGINAL,GENERALSCHEME2THESINGLEBOARDCOMPUTERAPPLICATIONASINGLEBOARDCOMPUTERTYPICALLYCONTAINSAMICROPROCESSORANDASUBSTANTIALAMOUNTOFMEMORY,THOUGHSMALLENOUGHTHATITMUSTBEUSEDCAREFULLYIFNEEDED,ACCESSTOADDITIONALRANDOMACCESSMEMORYISTHROUGHTHEBUS,WHICHISDESIGNEDFORGENERALITYANDSIMPLICITY,NOTFORHIGHPERFORMANCEMULTIBUS,INPARTICULAR,WASDEFINEDINTHEEARLY70STOOFFERANINEXPENSIVEMEANSOFCOMMUNICATIONAMONGAVARIETYOFSUBSYSTEMSALTHOUGHORIGINALLYINTRODUCEDBYINTELCORPORATION,ITHASFOUNDWIDEACCEPTANCE,HAVINGBEENPROPOSEDINASLIGHTLYMODIFIEDFORMASTHEIEEEP796BUSSTANDARD\/EEEB0\CURRENTLY,SEVERALHUNDREDVENDORSOFFERMULTIBUSCOMPATIBLECARDSWHILETHEMARKETHASRAPIDLYDEVELOPEDFORPRODUCTSUSINGTHISBUS,ITSAPPLICATIONSARELIMITEDBYTHESEVERECONSTRAINTIMPOSEDBYTHEBANDWIDTHOFMULTIBUSCLEARLYTHEBUSBANDWIDTHCOULDBEINCREASEDBYINCREASINGTHENUMBEROFPINS,ANDBYMODIFYINGTHEPROTOCOLITSBROADPOPULARITYANDTHEAVAILABILITYOFCOMPONENTSTOIMPLEMENTITSPROTOCOLMEAN,HOWEVER,THATITISLIKELYTOSURVIVEMANYYEARSINITSPRESENTFORMTHUSALARGEMARKETEXISTSFORACOMPUTERONACARDWHICH,MUCHASIFITWEREALLONASINGLECHIPHASSEVERELIMITATIONSONITSCOMMUNICATIONSWITHTHERESTOFTHESYSTEMWEDECIDEDTODETERMINEIFACACHEMEMORYSYSTEMCOULDBEIMPLEMENTEDEFFECTIVELYINTHEMULTIBUSENVIRONMENTTOTHATENDWEHAVEDESIGNEDACACHETOBEUSEDWITHACURRENTGENERATIONMICROPROCESSORINADDITION,WEHAVEDONEEXTENSIVESIMULATIONOFCACHEPERFORMANCE,DRIVENBYMEMORYTRACEDATAWEHAVEIDENTIFIEDANEWCOMPONENTWHICHISPARTICULARLYSUITEDFORVLSIIMPLEMENTATIONANDHAVEDEMONSTRATEDITSFEASIBILITYBYDESIGNINGIT\RAVISHANKARB3\THISCOMPONENT,WHICHIMPLEMENTSTHETAGMEMORYFORADYNAMICRAMCACHEINTENDEDFORAMICROPROCESSOR,ISSIMILARINMANYRESPECTSTOTHERECENTLYANNOUNCEDTMS2150\TREE\MULTIBUSSYSTEMSHAVEGENERALLYDEALTWITHTHEPROBLEMOFLIMITEDBUSBANDWIDTHBYREMOVINGMOSTOFTHEPROCESSORMEMORYACCESSESFROMTHEBUSEACHPROCESSORCARDHASITSOWNLOCALMEMORY,WHICHMAYBEADDRESSABLETOOTHERSTHROUGHTHEMULTIBUSWHILETHISAPPROACHHASMUCHINCOMMONWITHOURS,WEBELIEVETHATTHEALLOCATIONOFMEMORYLOCALORREMOTESHOULDBEHANDLEDBYTHESYSTEM,FREEINGTHEPROGRAMMEROFTHISTASKINTYPICALMULTIBUSAPPLICATIONS,CONSIDERABLEEFFORTISEXPENDEDGUARANTEEINGTHATTHEPROGRAMRUNNINGISPRIMARILYRESIDENTONBOARDTHISAPPROACHISVIABLEFORASTATICPARTITIONINGOFTASKSRESULTSTODATEHAVEBEENMUCHLESSSATISFACTORY,HOWEVER,FORTHEMOREGENERALSITUATIONWHEREANUMBEROFPROCESSORSAREDYNAMICALLYALLOCATEDFOREFFICIENCYREASONSITALSOPRECLUDESTHEUSEOFSHAREDCODESEG
编号:201401051948206810    类型:共享资源    大小:950.29KB    格式:PDF    上传时间:2014-01-05
  
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:32-Using Cache Memory to Reduce Processor-Memory Traffic.pdf
链接地址:http://www.renrendoc.com/p-256810.html

当前资源信息

4.0
 
(2人评价)
浏览:18次
baixue100上传于2014-01-05

官方联系方式

客服手机:17625900360   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

精品推荐

相关阅读

人人文库
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5