2026年向代币经济学战略性转向:驾驭AI支出新格局研究报告(英文版)-_第1页
2026年向代币经济学战略性转向:驾驭AI支出新格局研究报告(英文版)-_第2页
2026年向代币经济学战略性转向:驾驭AI支出新格局研究报告(英文版)-_第3页
2026年向代币经济学战略性转向:驾驭AI支出新格局研究报告(英文版)-_第4页
2026年向代币经济学战略性转向:驾驭AI支出新格局研究报告(英文版)-_第5页
已阅读5页,还剩47页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Deloitte.

Togethermakesprogress

Thepivottotokenomics:NavigatingAI’snew

spenddynamics

Thepivottotokenomics:NavigatingAI’snewspenddynamics

2

Anotefromtheauthors:

AIeconomicsaffectmost

organizationsandtheC-suiteuniquely.

Thispaperguidesthosefamiliar

withAItokensinmakingstrategic

choices.Ifyou,rejustbeginningyour

explorationoftokenomics,lookfor

additionalresearchsoon.

Thepivottotokenomics:NavigatingAI,snewspenddynamics

Traditionaltotal-cost-of-ownershipframeworksmisstherealityofAI

Volatileworkloads,newinfrastructuredemands,andtokensasthe

practicalunitofcost

Acrossindustries,GenerativeAI(GenAI)hasbecomethefastest-

growinglineiteminmostcorporatetechnologybudgets—already

consuminguptohalfofITspendinsomefirms.1Cloudbillsarerisingnearly20%yearoveryear,drivenbyAIworkloads.2Atthesametime,geopoliticaluncertaintiesareintensifyingcallsfordatasovereigntyandtechnologyinfrastructureindependence,makingmany

enterprisesthinkaboutAIsovereigntyandgaininggreatercontrolovertheirinfrastructure.3ThisisnolongeraCIOoperationalissue;itisaCFO-and-boardcapitalquestionabouthowtoresponsiblymanageaninvestmentofthisscaleandvolatility.

Unlikepriortechnologywavesgovernedbylicensesorvirtual

machines,AIspendoftenscalesinnonlinearandunpredictableways.AIcapabilitiesrunontokens:smallchunksofdata—text,imageoraudio—thatAIsystemsprocessintraining,inference,andreasoning.EveryAIinteractionconsumestokens,andeverytokencarriesacost.

ThecomplexityofAI’seconomicshideswithinthesetokens.

Costsrisenotonlywithuseradoptionbutwithworkloaddesign,

algorithmiccomplexity,andinfrastructureintensity.Whatexactlyarethethresholdstomoveacrossdifferentconsumptionchoices?Itdependsontheorganization.RoughlyaquarterofrespondentsinaDeloitte2025survey4ofdatacenterandpowerexecutivessaytheyortheirclientsarereadytomakethemoveoffofcloudtoalternativesassoonascostsreachjust26%to50%ofthosealternatives,showinghighsensitivitytoevenmodestpricechanges,whileothersplantowaituntilcloudcostsexceed150%ofthe

costofalternatives.ThedecisionpointremainsuncleargiventhehighvariabilitypatternsofAItechnologies.Forexample,advancedreasoningmodelsthatkeepcontextacrossmultiplestepscanconsumemuchmorecomputethanbasicone-shotresponses.

AsNVIDIAprojectsabillion-foldsurgeinAIcomputingandGooglenowprocesses1.3quadrilliontokensamonth5—a130-foldleapinjustayear—thecapitalandenergyimplicationsareprofound.

Traditionaltotalcostofownership(TCO)approachesarenolongerthebestwaytomanageAIeconomics.Leadersmaybebetterservedbyprecisioneconomics—theabilitytotrack,predict,andoptimizespendatthetokenlevel.Tokenstranslateopaqueinfrastructurechoicesintotangiblefinancialterms:thetruecostofgeneratingadollarofrevenue,margin,orproductivity.

ThecompetitivedividewillnotlikelyhingeonwhoadoptsAIfirst,butonwhomanagesitscoststructurewithdiscipline.AIspendwilllikelyseparatevaluecreatorsfromvalueeroders.Theformerconvert

tokensintomeasurableenterpriseoutput;thelatteraccumulateungovernedcostthatcompoundsquietlyacrossthestack.

3

Thepivottotokenomics:NavigatingAI,snewspenddynamics

4

TheelusiveAIROI

Despiterisinginvestment,manyleadersappeartostillbechasingmeasurablereturnoninvestment(ROI)fromAIinitiatives.

•Nearlyhalf(45%)of500leaders

surveyedinDeloitte’s2025US

TechValuesurvey

expectitwilltakeuptothreeyearstoseereturnon

investmentfrombasicAIautomation.6

•Sixin10ofthosecompletingDeloitte’s

2025TechValuesurveybelievemoreadvancedAIautomationwilltakeevenlongertoreachROI.

•Ofthe1,326globalfinanceleaders

surveyedforDeloitteGlobal’sinaugural

FinanceTrendsreport

,fieldedMay

2025,28%saidAIinvestmentsaredeliveringclear,measurablevalue.7

Buttheissueisn’twhetherAIwilldelivervalue—it’showtomeasureandmanagethatvalueinawayROIframeworkscannot.Formanyorganizations,adoptingAIisnolongeroptional;it’sastrategicresponsetocompetitiveorexistentialpressure.

ThatmakesunderstandingtheeconomicsofAI—howcosts,workloads

andreturnsflowthroughtokens—thenew

imperativeforleaders.

Thepivottotokenomics:NavigatingAI,snewspenddynamics

5

Tokens:ThenewcurrencyofAI

Unliketraditionalpricingbasedoncomputetime—whichisrelativelystatic—token-basedpricingtiescostdirectlytotheactualworkAIperforms.Eachtokenrepresentsbothaunitofcomputationandaunitofcost.Inthatsense,tokensarethetruecurrencyofAIeconomics—asindispensabletomachineintelligenceaskilowatthoursareto

electricity.Thedifferenceisthattokendemandisfarhardertopredictorcontrol,makingAIspendinherentlyvolatile.

•Nonlineardemand:Complexreasoningmodelsimproveperformancebutcanconsumemoretokensthansimpleinferencetasks.

•Fluctuatingtokenuse:Tokenusefluctuateswithexperimentationlevels,workloaddesign,modelchoiceandevenpromptengineering.

•Varyingpricing:TokenpricekeepschangingbasedonAImodelcapabilitiesandtheefficiencyoftheunderlyinginfrastructure.8

Whilethisvolatilityappearstostemfromusagepatterns,itsrootsareinthetechstack.Thecompute,storage,andnetworkingdecisionsthatpowerAImodelsdeterminehowefficientlytokensareprocessed—andhowcostlyeachonebecomes.

Atokenisnotjustatechnicalmeasure—itisaneconomicsignal.Eachtokencarries

thecompoundeffectofGPUdesign,storage,throughput,networklatency,andfacilityeconomics.Thedisciplineliesintracinglineage—frominfrastructuretotheAImodeltooutcome—andaligningthosedecisionssotokencostsstayproportionaltobusinessvalue.

Thepivottotokenomics:NavigatingAI,snewspenddynamics

6

Howtokensarebought

AIspendingisnotasinglemarket;itfracturesintodifferenteconomicrealitiesdependingonhoworganizationsconsumeintelligence.SomeleadersexperienceAIcostsonlyasasoftware-as-a-service(SaaS)lineitem,othersasmeteredapplicationprogramming

interface(API)calls,andagrowinggroup/cohortmanageitdirectlythroughinfrastructureownership—balancingGPUs,storage,networking,andenergy.

Buyingpatterns

•Generatingthroughpackagedsoftwareabstractstokensalmostentirely.Leadersseeapredictablesubscriptionorper-seatfee,butlittletransparencyintotokenconsumptionefficiency.Theriskislesscontrolformoresimplicity.

•ConsumingthroughAPIsmakestokensexplicit.Everyqueryismetered,

billed,andexposed.Thisbringstransparency,butalsovolatility:Costsrise

basedonworkloaddesign,promptlength,andhiddenchoicesofinfrastructureproviders.Costsgoupduetoatokenmeterrunninginrealtime.

•Runningonownedinfrastructurebringstokeneconomicsfullyin-house.

TokensbecometheoutcomeofdecisionsaboutGPUs,storagetiers,networking,andenergycontracts.Thisapproachdemandshighcapitalandtechnical

capabilitybutoffersthegreatestcontroloverlong-termcoststructureanddatasovereignty.Theemergingshorthandforthisstrategy:theAIfactory.

Eachofthesechoicesisgroundedinexistingandfuturetechnicalandoperatingdecisionsgivensystemcost,latency,security,andotherneeds,whichchangehowtokensflowintoenterpriseprofitandloss(P&L).9

Thepivottotokenomics:NavigatingAI,snewspenddynamics

7

WhatisanAIfactory,andwhendoesonemakesense?

DeloittedefinesanAIfactoryasaspecialized

infrastructure(compute,network,andstorage)along

withoptimizedsoftwareandservicesthatenablestheentireAIlifecycleathighperformancescale.Theprimaryproductisintelligence,measuredbytokenthroughput,whichdrivesdecisions,automation,andnewAIsolutions.

Oneofthehardestdecisionsenterprisesfaceiswhethertocontinuepayingfortokensoffpremises(off-prem)—throughAPIsortraditionalSaaScompanies—ortobuildanAIfactoryandself-managetheinfrastructure.The

economicsvarysharplydependingonscale,sensitivity,andpredictabilityofdemand:

•Off-prem(APIortraditionalSaaS

companies):Maybemostefficientforearlypilots,spikyorseasonalworkloads,orusecaseswithlowdatasensitivity.Costsare

typicallyhigherpertokenbutpredictableandflexible,withnoup-frontcapital

expense(capex).

•AIfactory:Canbecomeattractivewhen

workloadsarelarge,predictable,latency-

sensitive,andcrossathresholdwhere

buildingandoperatinginfrastructuredeliverslowereffectivetokeneconomicsthan

continuingtorentthem.Althoughcapex

investmentmaybeneeded,per-tokencostsfallasinfrastructureisfullyutilized,and

sovereigntyrisksarecontrolled.Beyond

thetraditionalon-premises(on-prem)or

colocation(co-lo)providers,anAIfactory

canalsobestoodupusingfast-growingcloudalternatives(neoclouds)tomanageworkloadredistributiontrends,as

detailedina

recentDeloittesurvey

.10

Thedecisionisnotbinary.Formostglobalenterprises,therealityishybrid.Smaller,lesspredictableand

exploratoryworkloadsmaystayinAPIform,while

scaled,highvalueworkloadsmayrunonanAIfactoryasapplicationsscaleandeconomicsstabilize.AImodelpreferenceandselectionmayalsodriveenterprise

decisionmaking.

Howtokensarepriced

Onceleadersunderstandtheirbuyertype(generate,consume,

run),thenextchallengeistoseehowtokensarepriced.ThesameAImodelcouldbebilledasaseatlicense,oratokenmeterorGPU-hours,dependingonhowitisconsumed.Therearethreemajor

constituentstotokenpricing:

1.Theunderlyingtechstack

2.Howitishostedandconsumed

3.WhattypeofAImodelandlevelofcustomizationisrequiredtopowerthesolution

TheAItechstack

EverytokenprocessedbyanAImodelreflectsacascadeofinfrastructuredecisions.

Forpackagedbuyers,inmostcasesandatleastfornow,thesecostsarehidden.Costsareabstracted,bundledintofamiliarenterprisecontractsandvendormanagedacrosseverylayerofthetech

stack,whichmakesunpackingTCOchallenging.

ForAPIconsumers,everyelementoftheAItechstackshowsup

indirectlyasper-tokenfeesorthroughputcharges.PricevariesbyAImodelaccessed,withdifferentinputandoutputrates,usually

reportedintokenpermillion.Discountedpricingoptionssuchas

reservedtokencapacity,promptcaching,orbatchexecutionratesareusuallyoffered,whileinsomecasesenterprisecustomersmayalsogetuser-basedpricing.Additionally,storageoregresschargesmayfurtheraddtoTCO.

Forself-hostedsolutions,tokensarenotpurchasedatall;theyemergefromexplicitcapexandoperatingexpense(opex)decisionsrelatedtoinfrastructurechoices(figures1and2).

Whatchangesacrossbuyertypesisnotwhetherthesecostsexist—theyalwaysdo—butwhoseesthem,controlsthem,andpaysforthem.

Thepivottotokenomics:NavigatingAI,snewspenddynamics

8

Figure1.HowtechnicaldecisionscandrivetokencostsandimplicationsforanAIfactory

STACKCOMPONENT

TOKENIMPLICATIONS

SELF-HOSTEDAIFACTORY

Compute

Graphicalprocessingunits(GPUs)and

accelerators

ModernGPUsandhigh-bandwidthmemoryshortentimepertokenbutcomewithhigheracquisitionor

rentalcost.

Largestdirectcost

Directinfrastructurespend

Rapidreleasecycles

Storage

High-speed

dataaccess

AIworkloadsstreamterabytesusingnonvolatilememoryandparallelfilesystemstosustain

performanceandmanagecost.Legacystorage

inflatesper-tokencostsbyaddinglatencyasGPUswaitfordata.

Nonvolatilememory,parallelfilesystems,vectordatabases

Heavyinvestment

Networking

GPUInterconnects

(InfiniBand,NVLink,PCIeGen5)

TrainingacrossthousandsofGPUsrequiresultra-low-latencyinterconnectstocutidlecyclesandlowercostpertoken,whiletraditionalapproachesoftendrive

tokencostshigher.

Directspend

PowerandcoolingEnergyintensity

ofAIracks

Asinglenext-generationGPUrackcandrawbetween250–300kW,comparedwith10–15kWfornon-

AIservers.Whetherbilleddirectly(on-prem)or

embeddedincloudpricing,thispoweruseshowsupineverytokenconsumed.

Highopex(250–300kWracks)

Liquidcoolingrequirements

Facilities

Physical

infrastructure

requirements

Heavierracks(upto3,000lb,11nearly40%morethantraditional),mayneedreinforcedflooring

andadvancedcoolingtobeembeddedinthecostofeverytoken.

Directcapex(reinforcedfloors,racks)

Operationalcosts

Relatedtostaffingandoperations:

•ITopsandmanagement

•Softwareandlicensing

•Applicationdevelopmentandintegration

•Datamanagementandgovernance

•Inferenceandserving

•Securityandcompliance

•Usertrainingandchangemanagement

Fullmachinelearningoperations(MLOps)costs

Fullcenterofexcellence(COE)andupskilling

OrchestrationframeworksandMLOpstools(data,orchestration,security)

Directcompliancespend,etc.

Source:Deloitteanalysisbasedonprojectexperience

Thepivottotokenomics:NavigatingAI,snewspenddynamics

9

Hostingmodels

HowtokensarepricedalsodependsonwhereandhowAImodelsarehosted.Thesamelargelanguagemodel(LLM)canbedeployedviaon-prem,colocation,hyperscalers,orAPIaccess,withradicallydifferenteconomics.Forapackagebuyer,thisdecisionisagaininvisibleandresideswiththevendor.FortheAPIconsumer,itcanvarybasedonwhichofthemanymodelsonthemarketisbeingconsumed,andthisexplainswhythesametaskmaycost

moredependingontheprovider.Forself-hostedAIinfrastructureusers,allhostingtypesarepossible,anditisoftenthemostimportantdeterminantofuniteconomics.

Thepivottotokenomics:NavigatingAI,snewspenddynamics

10

Figure2.GPUconsumptionmodelsandcoststructure

ON-PREM

NEOCLOUDPROVIDERS

HYPERSCALER

APIACCESS

Capexvs.opex

Highcapex/lowopex

Pureopex

Pureopex

Pureopex

Unitcost

ofcompute

(GPU/hour)

Lowest

~$1$2

Medium

~$1–$4average,buthighvariability,ondemand

High

~$3$7,

region/modeldependent

Veryhigh

$0.40$100ormorepermillionoutputtokens

Scalability

Medium

Slowduetoprocurement,power,andsetup

High

Dynamicresourceprovisioning

Medium/high

Dynamicscalingwithnear-infinitetop-end

Veryhigh

100%managedbytheprovider

Latency

Lowest

Fullcontrolover

hardwarestack

Low

Purpose-builtforAI,

butphysicallayout

notcontrollable;with

neoclouds,lowphysicalproximityismanageable

Medium

Near-zerocontroloverphysicallayerand

workloadplacement

Medium/high

Nocontroloverproviderinfrastructure/network,withlong-distance

communication

Controlandcustomization

Full

Medium

Nocontroloverphysicallayerormaintenance;highcontroloverwhatshosted

Medium

Treatedidenticallytoneocloudproviders

Verylow

Nocontrolover

infrastructurelayerandlimitedcontroloverAImodeltuning,formatofresponse

Security

anddata

sovereignty

Highest

Completecontroloverdataencryption,transit,storage

High

Treatedidenticallytoco-lo;neocloudsofferhigher

dataencryption

Medium

Dataleakageriskandlowcontroloverexacthostinglocation

Low

Nocontrolover

providerarchitectureorgovernancepractices

Deploymenttime

Long

Multi-monthprocurement,delivery,andsetup

Instant

Instant

Instant

Maintenance

responsibility

Customer

Managedservicesandsharedresponsibilitymodel(e.g.,facilities,energy,etc.)

Shared

Physicalinfrastructure:provider;allotherlayers:customer

Shared

Physicalinfrastructure:provider;allotherlayers:customer

AImodelprovider

Bestusecases

Stable,high-

throughputworkloads

Elasticcompute,

proofsofconcept(POCs),cost-sensitive

workloads;neocloudsmaybringadded

functionalityfordata-sensitiveworkloads

Elasticcompute,POCs

Fastexperimentation,agents,retrieval-

augmentedstorage(RAG)

Source:Deloitteanalysisbasedonpublicandproprietaryestimations,includingpubliclyavailableGPUpricingdata,APIpricingbenchmarks,and

hyperscalercostcalculatorreferences.IndicativereferencesincludepublicGPUcostanalysisandtotal-cost-of-ownershipmodels(e.g.,semi-analysisAITCOframework);publicAPIpricingbenchmarksforGenerativeAImodels(e.g.,representativeGPT-5familyrates);hyperscalercomputepricing

estimatesderivedfromstandardcloudcostcalculators

Thepivottotokenomics:NavigatingAI,snewspenddynamics

Ultimately,thecoststructurefollowsthearchitecture.Compute

density,networkproximity,andstoragethroughputeachinfluencehowefficientlytokensareprocessed—andtherefore,wherea

modelshouldlive.Thedecisionisn’taboutspeedorpreference;it’saboutmatchingworkloadphysicstobusinesseconomics.Inourexperience,we’vefoundhybridarchitecturessustainperformancewithoutinflatingtokencosts.

AImodelselection

AImodelstrategyisaseconddecisionpoint:open-sourceor

closedAImodels(proprietary).Packagebuyersinheritwhateverthevendorbuilds.APIuserscanchooseprovidersbutnotthemodels’economics.Onlyself-hostedAIfactoryuserscontrolthefulltrade-offacrosscost,flexibility,andsovereignty.12

Open-sourceAImodels

Open-sourcemodelsaregenerallyfreeandtypicallyrunin

self-hostedenvironments,givingenterprisesgreatercontrol,

customization,anddatasovereignty.Theyarewellsuitedforfine-tuningonproprietaryorsensitivedata,minimizingvendorlock-in,andloweringtokencostsovertime.

ExamplesincludeMetaLlama,Mistral,andothers.Emerging

frameworkssuchasNVIDIANIMMicroservicesillustratehow

vendorsarepackagingopen-sourcemodelsintostandardized,

securedeploymentunits—bringingoperationaldisciplinetowhatwasoncebespokeintegrationwork.

Proprietary(closed)AImodels

Theseareconsume-as-you-go,typicallybilledpertokenandallowuserstoquicklyhitthegroundwithnoup-frontinvestment,arepretrained,havestrongout-of-the-boxfunctionality,andenableaccesstovendorsupportforoperationalsupport.Examplesof

suchAImodelsincludeAnthropicClaude,GoogleGemini,OpenAIGPTs,xAIGrok,andothers.However,thistypicallycomeswith

higherper-tokencost,lowercostpredictabilityduetofluctuatingtokenusage,lackofcustomization,openconcernarounddata

storage,andriskofvendorlock-in.

11

Thepivottotokenomics:NavigatingAI,snewspenddynamics

DecodingtheAIcostcurve

AIeconomicsfollowJevons’paradox:Asefficiencyimproves,totalconsumptionrises.13Tokenpricesarefallingfast—whatoncecostdollarsperthousandnowcostspenniespermillion—andDeloitteprojectstheaverageinferencecostwilldropfrom$0.04permilliontokensin2025toabout$0.01by2030.14

Yetenterprisespendingcontinuestosurge.15Asagenticsystemsandmultiagentworkflowsproliferate,tokendemandgrowsexponentially—oftenfasterthaninfrastructureefficiencygainscanoffset.Theparadoxisn’tthatAIisbecomingcheaper;it’sthatefficiencyitselfisdrivingexpansion.Withoutdisciplinedcostgovernance,totalcostsgrow.

Whopaysthebill?

Thecostcurvedoesn’taffecteveryparticipantthesameway.Astokenconsumption

accelerates,thequestionbecomeswhoultimatelyabsorbsthatspend—theenterprise,thevendor,ortheenduser—andhowthosedynamicsevolveasworkloadsscaleandgrowmorecomplex.Deloitte,sTCOanalysisexaminesexactlywhereandwhenthosecostsshift.

12

Thepivottotokenomics:NavigatingAI,snewspenddynamics

ThetokenTCOestimationandscenarioanalysis

Toquantifythesedynamics,DeloitteconductedadetailedtokenTCOanalysisdesignedtocapturehowAI’sunderlyingeconomicsshiftacrossthefulltechstack.Theanalysistested

howtotalcostofownershipevolvesalongthreecriticaldimensionsthatshapetokenpricing:

1.Technologystack:TheGPUs,AImodels,andarchitecturespoweringAIworkloads.

2.Hostingapproach:Comparisonsasusageandcomplexityscaleovertime.

3.Usagescaling:Increaseintheoveralltokenconsumptiondrivenbyincreaseinusercountorthecomplexity/depthofreasoningeachusecasedemands.

TheobjectivewastounderstandhowthesefactorsinteracttoredefineorganizationalstrategybasedonwhatthekeydriversofAITCOare,howcostsevolveasusage

scales,andwheretheinflectionpointsemergeincostpertoken.Beforepresentingtheoutcomes,thenextsectionoutlinesthekeyassumptionsandconfigurations

underpinningthemodelusedinourtests.

13

Thepivottotokenomics:NavigatingAI,snewspenddynamics

Modelassumptions

Themodelwasbuilttotestrealistic,enterprise-scaleconditionsratherthanidealized

labsettings.16Whileitcanaccommodateawiderangeofconfigurations,theversion

summarizedherereflectsarepresentativescenarioacrosscommonenterpriseworkloads.

Thebaselineconfigurationincluded:

•Computestack:NVIDIAHGXB200GPUServer(NVLink/NVSwitchEnabled)|CPU–AMDEPYC9654.

•LLM:Llama3.370BFP8TP2,GPT-4oselectedbecauseavarietyofcommonconfigurationswerebeingtested.

•Hostingmodels:On-prem,APIaccess,specializedneocloudproviders

(NCPs).NCPsofferhourlyratesaswellasreservedcontractingfordifferentperiods.Inthismodel,weassumedhourlyandnotreservedpricing.

ThissetupenabledDeloittetoisolatehowhostingchoices,AImodelselection,andusage

maturityinteracttodrivetokenconsumptionandtotalcost.Thefollowinganalysishighlightstheresultingcostcurvesandinflectionpointsthatemergeasusagescales.Theanalysis

simulatesgrowthscalinginincrementsof8GPUs(figure3).

Figure3.Scenariocomplexityandtokenassumptionsdrivingfour-yearTCOdynamics

TOKENSCENARIOS

EXAMPLESCENARIODESCRIPTION/USECASE

YEAR1

Pilotstage

InitialdeploymentofsimpleusecasessuchaschatbotorFAQassistant:AlightweightconversationalAIusedforcustomerservice,HRinquiries,orbasicIThelpdesksupport.Handlesshort,structuredQ&Awithminimalcontextretention.

YEAR2

POC/lightweightadoption

Scalingtoincludeknowledge-drivenusecasessuchasdocumentsummarizationand

knowledgesearch:Internalenterpriseassistantthatretrievesandsummarizespolicydocuments,proposals,orcontracts.Includessemanticsearchandmultiturnconversations.

YEAR3

Inferencingatscale

Maturingtodrivedecision-supportusecasessuchasananalyticsco-pilot:Assistsconsultants,analysts,orauditorsingeneratinginsights,draftingreports,orperformingdataanalysisacross

multipledatasources.Includesreasoning,structuredoutput,andintegrationwithenterprisesystems.

Source:Deloitteanalysis

14

Thepivottotokenomics:NavigatingAI,snewspenddynamics

Navigatingtheeconomicsofanacceleratingtechnologyenvironment

TherapidpaceofAIhardwareadvancementhas

createdobsolescencecyclesthatfaroutpacetraditionaldepreciationschedules,withGPUgenerationsnow

refreshingrapidly.Forexample,recentmodelreleases

quicklyoutgrewthecapabilitiesofpreviouslyleading

GPUstounlockfeatures,whilelegacysupportforolder

hardwarediminishes.NewerGPUsthatswitchtoanannualreleasecyclefurtheracceleratestheserefreshdemands,challengingenterprisestocontinuallybalancethebenefitsoffasterupgradeswiththeriskoffallingbehind.

SuchrecentadvancesinGPUtechnologyhaveenabledAIapplicationsrequiringlargercontextlengths,suchasreasoningmodels,summarizingextensivetextcorpora,

andhigh-fidelitymultimodaltaskslikeanalyzinghour-longvideos.Theseusecases,includingagenticreasoning,

demandsubstantialGPUmemoryandthelatesthardwaretoaccuratelyprocesssuchcomplexorlarge-scaledata.

However,adoptionofmultimodality,andagenticreasoningattheenterpriselevelisinitsearlystages,andinferencetasksoftenrunwellonolderGPUsespeciallyformidsizemodels.

AstokenpricingforAImodelsdeclinesandtheeconomicsof“buildvs.buy”shiftrapidly,enterprisescannotrelyonstaticassumptionsandshoulddevelopforward-looking

infrastructurestrategies—carefullyplanningupgrades,assessingcosts,andensuringinvestmentsremainviableasthemarketstabilizesovertime.

15

Thepivottotokenomics:NavigatingAI,snewspenddynamics

16

Analysisoutcome

TheTCOsimulationincorporatedreal-worldparametersacrossthefullAIvaluechain—fromhardwareutilizationandenergycoststofacilitiesexpenses.Eachvariablewascalibratedtoreflectcurrentmarketconditionsandoperationalrealitiesratherthantheoreticalefficiency.

Thisapproachensuredaholisticviewofcostbehavior:howGPUutilizationrates,power

efficiency,andAImodelcomplexitycombinetoshapeeffectivecostpertoken.TheresultinganalysissurfacedtheunderlyingmechanicsofanewAIeconomy—onewheretechnicaldecisionsdirectlydictatefinancialoutcomes.

1.Usagescalingandcomplexitydriveshostingadvantage.

InourTCOmodeling,thefirstyearat10billiontokens,workloadsfavortheAPIaccess

approach—pay-as-you-goapproachesminimizeidlecapacitycosts.Asthenumberof

tokensrisesinyeartwo,theeconomicsflip.Athigherreasoningloadsmoretokensareconsumed,andself-hostedAIfactoriesoutperformAPIsasfixedinfrastructurecostsareabsorbedandutilizationincreases.Afterfouryears,thesimulationprojectedcumulativeTCOistwicethecostforAPIhostingasitwouldbeforanAIfactory,giventhesame

configurationandtokenscaling(figure4).

Figure4.Over3years,anAIfactoryis~2.1xmorecost-effectivethanAPI-basedsolutions

AIfactoryaverages~150%annualTCOgrowthvs.>1,000%(API)and>800%(NCP),ensuringmorestable,predictable,andmanageablecosts

AIfactorysees>90%dropin$/BtokensfromY1toY3

($24Kto$1.45K)vs.64%(API)and84%(NCP),becomingmostcost-efficientathighscale

ANNUALTOTALCOSTOFOWNERSHIP

AnnualTCO(USDinmillions)

4.0M

3.5M

3.0M

2.5M

2.0M

1.5M

1.0M

0.5M

0.0M

3.50M

Overa3-yearTCO,AIfactoryon-prem

2.72M

deliversmorethan50%costsavings

comparedtobothAPI-basedandNCPsolutions

1.45M

0.97M1.06M

0.49M

0.24M0.17M

Year1

Year2

Year3

10billiontokens

300billiontokens1

s1,000billiontokens(1trillion)

0.04M

AIfatloud(NCP)API

Source:Deloittesimulation

Pay-as-you-goAPIsandNCParemoresuitedtosimple,low-volumeworkloads,whileAIfactory(self-hosted)isco

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论