版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Deloitte.
Togethermakesprogress
Thepivottotokenomics:NavigatingAI’snew
spenddynamics
Thepivottotokenomics:NavigatingAI’snewspenddynamics
2
Anotefromtheauthors:
AIeconomicsaffectmost
organizationsandtheC-suiteuniquely.
Thispaperguidesthosefamiliar
withAItokensinmakingstrategic
choices.Ifyou,rejustbeginningyour
explorationoftokenomics,lookfor
additionalresearchsoon.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Traditionaltotal-cost-of-ownershipframeworksmisstherealityofAI
Volatileworkloads,newinfrastructuredemands,andtokensasthe
practicalunitofcost
Acrossindustries,GenerativeAI(GenAI)hasbecomethefastest-
growinglineiteminmostcorporatetechnologybudgets—already
consuminguptohalfofITspendinsomefirms.1Cloudbillsarerisingnearly20%yearoveryear,drivenbyAIworkloads.2Atthesametime,geopoliticaluncertaintiesareintensifyingcallsfordatasovereigntyandtechnologyinfrastructureindependence,makingmany
enterprisesthinkaboutAIsovereigntyandgaininggreatercontrolovertheirinfrastructure.3ThisisnolongeraCIOoperationalissue;itisaCFO-and-boardcapitalquestionabouthowtoresponsiblymanageaninvestmentofthisscaleandvolatility.
Unlikepriortechnologywavesgovernedbylicensesorvirtual
machines,AIspendoftenscalesinnonlinearandunpredictableways.AIcapabilitiesrunontokens:smallchunksofdata—text,imageoraudio—thatAIsystemsprocessintraining,inference,andreasoning.EveryAIinteractionconsumestokens,andeverytokencarriesacost.
ThecomplexityofAI’seconomicshideswithinthesetokens.
Costsrisenotonlywithuseradoptionbutwithworkloaddesign,
algorithmiccomplexity,andinfrastructureintensity.Whatexactlyarethethresholdstomoveacrossdifferentconsumptionchoices?Itdependsontheorganization.RoughlyaquarterofrespondentsinaDeloitte2025survey4ofdatacenterandpowerexecutivessaytheyortheirclientsarereadytomakethemoveoffofcloudtoalternativesassoonascostsreachjust26%to50%ofthosealternatives,showinghighsensitivitytoevenmodestpricechanges,whileothersplantowaituntilcloudcostsexceed150%ofthe
costofalternatives.ThedecisionpointremainsuncleargiventhehighvariabilitypatternsofAItechnologies.Forexample,advancedreasoningmodelsthatkeepcontextacrossmultiplestepscanconsumemuchmorecomputethanbasicone-shotresponses.
AsNVIDIAprojectsabillion-foldsurgeinAIcomputingandGooglenowprocesses1.3quadrilliontokensamonth5—a130-foldleapinjustayear—thecapitalandenergyimplicationsareprofound.
Traditionaltotalcostofownership(TCO)approachesarenolongerthebestwaytomanageAIeconomics.Leadersmaybebetterservedbyprecisioneconomics—theabilitytotrack,predict,andoptimizespendatthetokenlevel.Tokenstranslateopaqueinfrastructurechoicesintotangiblefinancialterms:thetruecostofgeneratingadollarofrevenue,margin,orproductivity.
ThecompetitivedividewillnotlikelyhingeonwhoadoptsAIfirst,butonwhomanagesitscoststructurewithdiscipline.AIspendwilllikelyseparatevaluecreatorsfromvalueeroders.Theformerconvert
tokensintomeasurableenterpriseoutput;thelatteraccumulateungovernedcostthatcompoundsquietlyacrossthestack.
3
Thepivottotokenomics:NavigatingAI,snewspenddynamics
4
TheelusiveAIROI
Despiterisinginvestment,manyleadersappeartostillbechasingmeasurablereturnoninvestment(ROI)fromAIinitiatives.
•Nearlyhalf(45%)of500leaders
surveyedinDeloitte’s2025US
TechValuesurvey
expectitwilltakeuptothreeyearstoseereturnon
investmentfrombasicAIautomation.6
•Sixin10ofthosecompletingDeloitte’s
2025TechValuesurveybelievemoreadvancedAIautomationwilltakeevenlongertoreachROI.
•Ofthe1,326globalfinanceleaders
surveyedforDeloitteGlobal’sinaugural
FinanceTrendsreport
,fieldedMay
2025,28%saidAIinvestmentsaredeliveringclear,measurablevalue.7
Buttheissueisn’twhetherAIwilldelivervalue—it’showtomeasureandmanagethatvalueinawayROIframeworkscannot.Formanyorganizations,adoptingAIisnolongeroptional;it’sastrategicresponsetocompetitiveorexistentialpressure.
ThatmakesunderstandingtheeconomicsofAI—howcosts,workloads
andreturnsflowthroughtokens—thenew
imperativeforleaders.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
5
Tokens:ThenewcurrencyofAI
Unliketraditionalpricingbasedoncomputetime—whichisrelativelystatic—token-basedpricingtiescostdirectlytotheactualworkAIperforms.Eachtokenrepresentsbothaunitofcomputationandaunitofcost.Inthatsense,tokensarethetruecurrencyofAIeconomics—asindispensabletomachineintelligenceaskilowatthoursareto
electricity.Thedifferenceisthattokendemandisfarhardertopredictorcontrol,makingAIspendinherentlyvolatile.
•Nonlineardemand:Complexreasoningmodelsimproveperformancebutcanconsumemoretokensthansimpleinferencetasks.
•Fluctuatingtokenuse:Tokenusefluctuateswithexperimentationlevels,workloaddesign,modelchoiceandevenpromptengineering.
•Varyingpricing:TokenpricekeepschangingbasedonAImodelcapabilitiesandtheefficiencyoftheunderlyinginfrastructure.8
Whilethisvolatilityappearstostemfromusagepatterns,itsrootsareinthetechstack.Thecompute,storage,andnetworkingdecisionsthatpowerAImodelsdeterminehowefficientlytokensareprocessed—andhowcostlyeachonebecomes.
Atokenisnotjustatechnicalmeasure—itisaneconomicsignal.Eachtokencarries
thecompoundeffectofGPUdesign,storage,throughput,networklatency,andfacilityeconomics.Thedisciplineliesintracinglineage—frominfrastructuretotheAImodeltooutcome—andaligningthosedecisionssotokencostsstayproportionaltobusinessvalue.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
6
Howtokensarebought
AIspendingisnotasinglemarket;itfracturesintodifferenteconomicrealitiesdependingonhoworganizationsconsumeintelligence.SomeleadersexperienceAIcostsonlyasasoftware-as-a-service(SaaS)lineitem,othersasmeteredapplicationprogramming
interface(API)calls,andagrowinggroup/cohortmanageitdirectlythroughinfrastructureownership—balancingGPUs,storage,networking,andenergy.
Buyingpatterns
•Generatingthroughpackagedsoftwareabstractstokensalmostentirely.Leadersseeapredictablesubscriptionorper-seatfee,butlittletransparencyintotokenconsumptionefficiency.Theriskislesscontrolformoresimplicity.
•ConsumingthroughAPIsmakestokensexplicit.Everyqueryismetered,
billed,andexposed.Thisbringstransparency,butalsovolatility:Costsrise
basedonworkloaddesign,promptlength,andhiddenchoicesofinfrastructureproviders.Costsgoupduetoatokenmeterrunninginrealtime.
•Runningonownedinfrastructurebringstokeneconomicsfullyin-house.
TokensbecometheoutcomeofdecisionsaboutGPUs,storagetiers,networking,andenergycontracts.Thisapproachdemandshighcapitalandtechnical
capabilitybutoffersthegreatestcontroloverlong-termcoststructureanddatasovereignty.Theemergingshorthandforthisstrategy:theAIfactory.
Eachofthesechoicesisgroundedinexistingandfuturetechnicalandoperatingdecisionsgivensystemcost,latency,security,andotherneeds,whichchangehowtokensflowintoenterpriseprofitandloss(P&L).9
Thepivottotokenomics:NavigatingAI,snewspenddynamics
7
WhatisanAIfactory,andwhendoesonemakesense?
DeloittedefinesanAIfactoryasaspecialized
infrastructure(compute,network,andstorage)along
withoptimizedsoftwareandservicesthatenablestheentireAIlifecycleathighperformancescale.Theprimaryproductisintelligence,measuredbytokenthroughput,whichdrivesdecisions,automation,andnewAIsolutions.
Oneofthehardestdecisionsenterprisesfaceiswhethertocontinuepayingfortokensoffpremises(off-prem)—throughAPIsortraditionalSaaScompanies—ortobuildanAIfactoryandself-managetheinfrastructure.The
economicsvarysharplydependingonscale,sensitivity,andpredictabilityofdemand:
•Off-prem(APIortraditionalSaaS
companies):Maybemostefficientforearlypilots,spikyorseasonalworkloads,orusecaseswithlowdatasensitivity.Costsare
typicallyhigherpertokenbutpredictableandflexible,withnoup-frontcapital
expense(capex).
•AIfactory:Canbecomeattractivewhen
workloadsarelarge,predictable,latency-
sensitive,andcrossathresholdwhere
buildingandoperatinginfrastructuredeliverslowereffectivetokeneconomicsthan
continuingtorentthem.Althoughcapex
investmentmaybeneeded,per-tokencostsfallasinfrastructureisfullyutilized,and
sovereigntyrisksarecontrolled.Beyond
thetraditionalon-premises(on-prem)or
colocation(co-lo)providers,anAIfactory
canalsobestoodupusingfast-growingcloudalternatives(neoclouds)tomanageworkloadredistributiontrends,as
detailedina
recentDeloittesurvey
.10
Thedecisionisnotbinary.Formostglobalenterprises,therealityishybrid.Smaller,lesspredictableand
exploratoryworkloadsmaystayinAPIform,while
scaled,highvalueworkloadsmayrunonanAIfactoryasapplicationsscaleandeconomicsstabilize.AImodelpreferenceandselectionmayalsodriveenterprise
decisionmaking.
Howtokensarepriced
Onceleadersunderstandtheirbuyertype(generate,consume,
run),thenextchallengeistoseehowtokensarepriced.ThesameAImodelcouldbebilledasaseatlicense,oratokenmeterorGPU-hours,dependingonhowitisconsumed.Therearethreemajor
constituentstotokenpricing:
1.Theunderlyingtechstack
2.Howitishostedandconsumed
3.WhattypeofAImodelandlevelofcustomizationisrequiredtopowerthesolution
TheAItechstack
EverytokenprocessedbyanAImodelreflectsacascadeofinfrastructuredecisions.
Forpackagedbuyers,inmostcasesandatleastfornow,thesecostsarehidden.Costsareabstracted,bundledintofamiliarenterprisecontractsandvendormanagedacrosseverylayerofthetech
stack,whichmakesunpackingTCOchallenging.
ForAPIconsumers,everyelementoftheAItechstackshowsup
indirectlyasper-tokenfeesorthroughputcharges.PricevariesbyAImodelaccessed,withdifferentinputandoutputrates,usually
reportedintokenpermillion.Discountedpricingoptionssuchas
reservedtokencapacity,promptcaching,orbatchexecutionratesareusuallyoffered,whileinsomecasesenterprisecustomersmayalsogetuser-basedpricing.Additionally,storageoregresschargesmayfurtheraddtoTCO.
Forself-hostedsolutions,tokensarenotpurchasedatall;theyemergefromexplicitcapexandoperatingexpense(opex)decisionsrelatedtoinfrastructurechoices(figures1and2).
Whatchangesacrossbuyertypesisnotwhetherthesecostsexist—theyalwaysdo—butwhoseesthem,controlsthem,andpaysforthem.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
8
Figure1.HowtechnicaldecisionscandrivetokencostsandimplicationsforanAIfactory
STACKCOMPONENT
TOKENIMPLICATIONS
SELF-HOSTEDAIFACTORY
Compute
Graphicalprocessingunits(GPUs)and
accelerators
ModernGPUsandhigh-bandwidthmemoryshortentimepertokenbutcomewithhigheracquisitionor
rentalcost.
Largestdirectcost
Directinfrastructurespend
Rapidreleasecycles
Storage
High-speed
dataaccess
AIworkloadsstreamterabytesusingnonvolatilememoryandparallelfilesystemstosustain
performanceandmanagecost.Legacystorage
inflatesper-tokencostsbyaddinglatencyasGPUswaitfordata.
Nonvolatilememory,parallelfilesystems,vectordatabases
Heavyinvestment
Networking
GPUInterconnects
(InfiniBand,NVLink,PCIeGen5)
TrainingacrossthousandsofGPUsrequiresultra-low-latencyinterconnectstocutidlecyclesandlowercostpertoken,whiletraditionalapproachesoftendrive
tokencostshigher.
Directspend
PowerandcoolingEnergyintensity
ofAIracks
Asinglenext-generationGPUrackcandrawbetween250–300kW,comparedwith10–15kWfornon-
AIservers.Whetherbilleddirectly(on-prem)or
embeddedincloudpricing,thispoweruseshowsupineverytokenconsumed.
Highopex(250–300kWracks)
Liquidcoolingrequirements
Facilities
Physical
infrastructure
requirements
Heavierracks(upto3,000lb,11nearly40%morethantraditional),mayneedreinforcedflooring
andadvancedcoolingtobeembeddedinthecostofeverytoken.
Directcapex(reinforcedfloors,racks)
Operationalcosts
Relatedtostaffingandoperations:
•ITopsandmanagement
•Softwareandlicensing
•Applicationdevelopmentandintegration
•Datamanagementandgovernance
•Inferenceandserving
•Securityandcompliance
•Usertrainingandchangemanagement
Fullmachinelearningoperations(MLOps)costs
Fullcenterofexcellence(COE)andupskilling
OrchestrationframeworksandMLOpstools(data,orchestration,security)
Directcompliancespend,etc.
Source:Deloitteanalysisbasedonprojectexperience
Thepivottotokenomics:NavigatingAI,snewspenddynamics
9
Hostingmodels
HowtokensarepricedalsodependsonwhereandhowAImodelsarehosted.Thesamelargelanguagemodel(LLM)canbedeployedviaon-prem,colocation,hyperscalers,orAPIaccess,withradicallydifferenteconomics.Forapackagebuyer,thisdecisionisagaininvisibleandresideswiththevendor.FortheAPIconsumer,itcanvarybasedonwhichofthemanymodelsonthemarketisbeingconsumed,andthisexplainswhythesametaskmaycost
moredependingontheprovider.Forself-hostedAIinfrastructureusers,allhostingtypesarepossible,anditisoftenthemostimportantdeterminantofuniteconomics.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
10
Figure2.GPUconsumptionmodelsandcoststructure
ON-PREM
NEOCLOUDPROVIDERS
HYPERSCALER
APIACCESS
Capexvs.opex
Highcapex/lowopex
Pureopex
Pureopex
Pureopex
Unitcost
ofcompute
(GPU/hour)
Lowest
~$1$2
Medium
~$1–$4average,buthighvariability,ondemand
High
~$3$7,
region/modeldependent
Veryhigh
$0.40$100ormorepermillionoutputtokens
Scalability
Medium
Slowduetoprocurement,power,andsetup
High
Dynamicresourceprovisioning
Medium/high
Dynamicscalingwithnear-infinitetop-end
Veryhigh
100%managedbytheprovider
Latency
Lowest
Fullcontrolover
hardwarestack
Low
Purpose-builtforAI,
butphysicallayout
notcontrollable;with
neoclouds,lowphysicalproximityismanageable
Medium
Near-zerocontroloverphysicallayerand
workloadplacement
Medium/high
Nocontroloverproviderinfrastructure/network,withlong-distance
communication
Controlandcustomization
Full
Medium
Nocontroloverphysicallayerormaintenance;highcontroloverwhatshosted
Medium
Treatedidenticallytoneocloudproviders
Verylow
Nocontrolover
infrastructurelayerandlimitedcontroloverAImodeltuning,formatofresponse
Security
anddata
sovereignty
Highest
Completecontroloverdataencryption,transit,storage
High
Treatedidenticallytoco-lo;neocloudsofferhigher
dataencryption
Medium
Dataleakageriskandlowcontroloverexacthostinglocation
Low
Nocontrolover
providerarchitectureorgovernancepractices
Deploymenttime
Long
Multi-monthprocurement,delivery,andsetup
Instant
Instant
Instant
Maintenance
responsibility
Customer
Managedservicesandsharedresponsibilitymodel(e.g.,facilities,energy,etc.)
Shared
Physicalinfrastructure:provider;allotherlayers:customer
Shared
Physicalinfrastructure:provider;allotherlayers:customer
AImodelprovider
Bestusecases
Stable,high-
throughputworkloads
Elasticcompute,
proofsofconcept(POCs),cost-sensitive
workloads;neocloudsmaybringadded
functionalityfordata-sensitiveworkloads
Elasticcompute,POCs
Fastexperimentation,agents,retrieval-
augmentedstorage(RAG)
Source:Deloitteanalysisbasedonpublicandproprietaryestimations,includingpubliclyavailableGPUpricingdata,APIpricingbenchmarks,and
hyperscalercostcalculatorreferences.IndicativereferencesincludepublicGPUcostanalysisandtotal-cost-of-ownershipmodels(e.g.,semi-analysisAITCOframework);publicAPIpricingbenchmarksforGenerativeAImodels(e.g.,representativeGPT-5familyrates);hyperscalercomputepricing
estimatesderivedfromstandardcloudcostcalculators
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Ultimately,thecoststructurefollowsthearchitecture.Compute
density,networkproximity,andstoragethroughputeachinfluencehowefficientlytokensareprocessed—andtherefore,wherea
modelshouldlive.Thedecisionisn’taboutspeedorpreference;it’saboutmatchingworkloadphysicstobusinesseconomics.Inourexperience,we’vefoundhybridarchitecturessustainperformancewithoutinflatingtokencosts.
AImodelselection
AImodelstrategyisaseconddecisionpoint:open-sourceor
closedAImodels(proprietary).Packagebuyersinheritwhateverthevendorbuilds.APIuserscanchooseprovidersbutnotthemodels’economics.Onlyself-hostedAIfactoryuserscontrolthefulltrade-offacrosscost,flexibility,andsovereignty.12
Open-sourceAImodels
Open-sourcemodelsaregenerallyfreeandtypicallyrunin
self-hostedenvironments,givingenterprisesgreatercontrol,
customization,anddatasovereignty.Theyarewellsuitedforfine-tuningonproprietaryorsensitivedata,minimizingvendorlock-in,andloweringtokencostsovertime.
ExamplesincludeMetaLlama,Mistral,andothers.Emerging
frameworkssuchasNVIDIANIMMicroservicesillustratehow
vendorsarepackagingopen-sourcemodelsintostandardized,
securedeploymentunits—bringingoperationaldisciplinetowhatwasoncebespokeintegrationwork.
Proprietary(closed)AImodels
Theseareconsume-as-you-go,typicallybilledpertokenandallowuserstoquicklyhitthegroundwithnoup-frontinvestment,arepretrained,havestrongout-of-the-boxfunctionality,andenableaccesstovendorsupportforoperationalsupport.Examplesof
suchAImodelsincludeAnthropicClaude,GoogleGemini,OpenAIGPTs,xAIGrok,andothers.However,thistypicallycomeswith
higherper-tokencost,lowercostpredictabilityduetofluctuatingtokenusage,lackofcustomization,openconcernarounddata
storage,andriskofvendorlock-in.
11
Thepivottotokenomics:NavigatingAI,snewspenddynamics
DecodingtheAIcostcurve
AIeconomicsfollowJevons’paradox:Asefficiencyimproves,totalconsumptionrises.13Tokenpricesarefallingfast—whatoncecostdollarsperthousandnowcostspenniespermillion—andDeloitteprojectstheaverageinferencecostwilldropfrom$0.04permilliontokensin2025toabout$0.01by2030.14
Yetenterprisespendingcontinuestosurge.15Asagenticsystemsandmultiagentworkflowsproliferate,tokendemandgrowsexponentially—oftenfasterthaninfrastructureefficiencygainscanoffset.Theparadoxisn’tthatAIisbecomingcheaper;it’sthatefficiencyitselfisdrivingexpansion.Withoutdisciplinedcostgovernance,totalcostsgrow.
Whopaysthebill?
Thecostcurvedoesn’taffecteveryparticipantthesameway.Astokenconsumption
accelerates,thequestionbecomeswhoultimatelyabsorbsthatspend—theenterprise,thevendor,ortheenduser—andhowthosedynamicsevolveasworkloadsscaleandgrowmorecomplex.Deloitte,sTCOanalysisexaminesexactlywhereandwhenthosecostsshift.
12
Thepivottotokenomics:NavigatingAI,snewspenddynamics
ThetokenTCOestimationandscenarioanalysis
Toquantifythesedynamics,DeloitteconductedadetailedtokenTCOanalysisdesignedtocapturehowAI’sunderlyingeconomicsshiftacrossthefulltechstack.Theanalysistested
howtotalcostofownershipevolvesalongthreecriticaldimensionsthatshapetokenpricing:
1.Technologystack:TheGPUs,AImodels,andarchitecturespoweringAIworkloads.
2.Hostingapproach:Comparisonsasusageandcomplexityscaleovertime.
3.Usagescaling:Increaseintheoveralltokenconsumptiondrivenbyincreaseinusercountorthecomplexity/depthofreasoningeachusecasedemands.
TheobjectivewastounderstandhowthesefactorsinteracttoredefineorganizationalstrategybasedonwhatthekeydriversofAITCOare,howcostsevolveasusage
scales,andwheretheinflectionpointsemergeincostpertoken.Beforepresentingtheoutcomes,thenextsectionoutlinesthekeyassumptionsandconfigurations
underpinningthemodelusedinourtests.
13
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Modelassumptions
Themodelwasbuilttotestrealistic,enterprise-scaleconditionsratherthanidealized
labsettings.16Whileitcanaccommodateawiderangeofconfigurations,theversion
summarizedherereflectsarepresentativescenarioacrosscommonenterpriseworkloads.
Thebaselineconfigurationincluded:
•Computestack:NVIDIAHGXB200GPUServer(NVLink/NVSwitchEnabled)|CPU–AMDEPYC9654.
•LLM:Llama3.370BFP8TP2,GPT-4oselectedbecauseavarietyofcommonconfigurationswerebeingtested.
•Hostingmodels:On-prem,APIaccess,specializedneocloudproviders
(NCPs).NCPsofferhourlyratesaswellasreservedcontractingfordifferentperiods.Inthismodel,weassumedhourlyandnotreservedpricing.
ThissetupenabledDeloittetoisolatehowhostingchoices,AImodelselection,andusage
maturityinteracttodrivetokenconsumptionandtotalcost.Thefollowinganalysishighlightstheresultingcostcurvesandinflectionpointsthatemergeasusagescales.Theanalysis
simulatesgrowthscalinginincrementsof8GPUs(figure3).
Figure3.Scenariocomplexityandtokenassumptionsdrivingfour-yearTCOdynamics
TOKENSCENARIOS
EXAMPLESCENARIODESCRIPTION/USECASE
YEAR1
Pilotstage
InitialdeploymentofsimpleusecasessuchaschatbotorFAQassistant:AlightweightconversationalAIusedforcustomerservice,HRinquiries,orbasicIThelpdesksupport.Handlesshort,structuredQ&Awithminimalcontextretention.
YEAR2
POC/lightweightadoption
Scalingtoincludeknowledge-drivenusecasessuchasdocumentsummarizationand
knowledgesearch:Internalenterpriseassistantthatretrievesandsummarizespolicydocuments,proposals,orcontracts.Includessemanticsearchandmultiturnconversations.
YEAR3
Inferencingatscale
Maturingtodrivedecision-supportusecasessuchasananalyticsco-pilot:Assistsconsultants,analysts,orauditorsingeneratinginsights,draftingreports,orperformingdataanalysisacross
multipledatasources.Includesreasoning,structuredoutput,andintegrationwithenterprisesystems.
Source:Deloitteanalysis
14
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Navigatingtheeconomicsofanacceleratingtechnologyenvironment
TherapidpaceofAIhardwareadvancementhas
createdobsolescencecyclesthatfaroutpacetraditionaldepreciationschedules,withGPUgenerationsnow
refreshingrapidly.Forexample,recentmodelreleases
quicklyoutgrewthecapabilitiesofpreviouslyleading
GPUstounlockfeatures,whilelegacysupportforolder
hardwarediminishes.NewerGPUsthatswitchtoanannualreleasecyclefurtheracceleratestheserefreshdemands,challengingenterprisestocontinuallybalancethebenefitsoffasterupgradeswiththeriskoffallingbehind.
SuchrecentadvancesinGPUtechnologyhaveenabledAIapplicationsrequiringlargercontextlengths,suchasreasoningmodels,summarizingextensivetextcorpora,
andhigh-fidelitymultimodaltaskslikeanalyzinghour-longvideos.Theseusecases,includingagenticreasoning,
demandsubstantialGPUmemoryandthelatesthardwaretoaccuratelyprocesssuchcomplexorlarge-scaledata.
However,adoptionofmultimodality,andagenticreasoningattheenterpriselevelisinitsearlystages,andinferencetasksoftenrunwellonolderGPUsespeciallyformidsizemodels.
AstokenpricingforAImodelsdeclinesandtheeconomicsof“buildvs.buy”shiftrapidly,enterprisescannotrelyonstaticassumptionsandshoulddevelopforward-looking
infrastructurestrategies—carefullyplanningupgrades,assessingcosts,andensuringinvestmentsremainviableasthemarketstabilizesovertime.
15
Thepivottotokenomics:NavigatingAI,snewspenddynamics
16
Analysisoutcome
TheTCOsimulationincorporatedreal-worldparametersacrossthefullAIvaluechain—fromhardwareutilizationandenergycoststofacilitiesexpenses.Eachvariablewascalibratedtoreflectcurrentmarketconditionsandoperationalrealitiesratherthantheoreticalefficiency.
Thisapproachensuredaholisticviewofcostbehavior:howGPUutilizationrates,power
efficiency,andAImodelcomplexitycombinetoshapeeffectivecostpertoken.TheresultinganalysissurfacedtheunderlyingmechanicsofanewAIeconomy—onewheretechnicaldecisionsdirectlydictatefinancialoutcomes.
1.Usagescalingandcomplexitydriveshostingadvantage.
InourTCOmodeling,thefirstyearat10billiontokens,workloadsfavortheAPIaccess
approach—pay-as-you-goapproachesminimizeidlecapacitycosts.Asthenumberof
tokensrisesinyeartwo,theeconomicsflip.Athigherreasoningloadsmoretokensareconsumed,andself-hostedAIfactoriesoutperformAPIsasfixedinfrastructurecostsareabsorbedandutilizationincreases.Afterfouryears,thesimulationprojectedcumulativeTCOistwicethecostforAPIhostingasitwouldbeforanAIfactory,giventhesame
configurationandtokenscaling(figure4).
Figure4.Over3years,anAIfactoryis~2.1xmorecost-effectivethanAPI-basedsolutions
AIfactoryaverages~150%annualTCOgrowthvs.>1,000%(API)and>800%(NCP),ensuringmorestable,predictable,andmanageablecosts
AIfactorysees>90%dropin$/BtokensfromY1toY3
($24Kto$1.45K)vs.64%(API)and84%(NCP),becomingmostcost-efficientathighscale
ANNUALTOTALCOSTOFOWNERSHIP
AnnualTCO(USDinmillions)
4.0M
3.5M
3.0M
2.5M
2.0M
1.5M
1.0M
0.5M
0.0M
3.50M
Overa3-yearTCO,AIfactoryon-prem
2.72M
deliversmorethan50%costsavings
comparedtobothAPI-basedandNCPsolutions
1.45M
0.97M1.06M
0.49M
0.24M0.17M
Year1
Year2
Year3
10billiontokens
300billiontokens1
s1,000billiontokens(1trillion)
0.04M
AIfatloud(NCP)API
Source:Deloittesimulation
Pay-as-you-goAPIsandNCParemoresuitedtosimple,low-volumeworkloads,whileAIfactory(self-hosted)isco
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 安全警示标识管理制度及流程
- 2024年郎溪县招教考试备考题库及答案解析(夺冠)
- 2024年皮山县招教考试备考题库带答案解析(夺冠)
- 2025年花垣县招教考试备考题库带答案解析
- 2025年新源县招教考试备考题库带答案解析
- 2025年南京中医药大学翰林学院马克思主义基本原理概论期末考试模拟题及答案解析(夺冠)
- 2025年滦县招教考试备考题库附答案解析
- 2025年山西铁道职业技术学院单招职业倾向性考试题库附答案解析
- 2025年南昌健康职业技术学院单招职业技能测试题库带答案解析
- 2025年桐梓县幼儿园教师招教考试备考题库带答案解析
- 2025年医疗人工智能产业报告-蛋壳研究院
- 长沙股权激励协议书
- 问卷星使用培训
- 心源性脑卒中的防治课件
- 2025年党员民主评议个人总结2篇
- 果园合伙经营协议书
- 2026中国民营医院集团化发展过程中的人才梯队建设专题报告
- 物业管理经理培训课件
- 员工解除竞业协议通知书
- 【语文】太原市小学一年级上册期末试题(含答案)
- 储能电站员工转正述职报告
评论
0/150
提交评论