版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
OrchestratingAIInfrastructureinChina:
EmergingLeadersinaFragmentedand
High-ValueMarket
——2026AIInfrastructureOrchestrationPlatform
WhitePaper
Jun,2026
LeadLeo
©2026Frost&Sullivan.
AllRightsReserved
1
FromFragmentationtoOrchestration
EnablingScalableAIinaMulti-ChipWorld
oAIWorkloadsAreScalingBeyondHardwareEfficiencyGains
oAIscalingisshiftingfromchip-levelperformancetocluster-scaleinfrastructure
oMulti-chipcoexistenceisbecomingastructuralcharacteristicofChina’sAIecosystem
oThree-LayerProblemFramework:Heterogeneity+Efficiency+SLA
oFromResourceManagementtoSystem-LevelCoordination
ovGPUandModelHubasComplementaryLayersFormingIntegratedAIInfrastructureOrchestration
2026AIInfrastructureOrchestrationPlatformWhitePaper
AIWorkloadsAreScalingBeyondHardwareEfficiencyGains
Drivenbyfoundationmodelbreakthroughsandenterpriseadoption,AIworkloadsare
expandingatapacethatexceedsimprovementsinsingle-chipperformance.
TokenConsumptionGrowth
Foundationmodelbreakthroughs
Inferenceexplosion
140times
140,000
EnterpriseAIadoption
billiontoken\100
Agent-basedworkflows
2024.12025.12
AIdemandisincreasinglydrivenbyinferenceratherthantraining,leadingtosustainedandscalablecompute
consumption.
2
024.1Non-reasoningReasoning2025.12
Reasoningvs.Non-ReasoningTokenTrends.Shareofalltokensroutedthroughreasoning-optimizedmodelshasrisensteadilysinceearly2025.Themetricreflectstheproportionofalltokensservedbyreasoningmodels,nottheshareof"reasoningtokens"withinmodeloutputs.
Thedatapointstoaclearconclusion:reasoning-orientedmodelsarebecomingthedefaultpathforrealworkloads,andtheshareoftokensflowingthroughthemisnowaleadingindicatorofhowuserswanttointeractwithAIsystems.AIenterstheinference-drivenera
ShiftTowardReasoningWorkloads
50%
3
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
AIscalingisshiftingfromchip-levelperformancetocluster-scaleinfrastructure
Asinferenceworkloadsbecomecontinuouslyactive,massivelyconcurrentandincreasinglydistributed,AIscalingisincreasinglydependentonlarge-scaleGPUclusterinfrastructure
ratherthanstandalonechipperformance.
cluster-scale
infrastructure
expansion
FUTURE
~1000KGPUs
continuously
activeinference
large-scaleAIfactories
system-level
orchestration
becomescritical
CURRENT
~100KGPUs
inferencedemandaccelerates
distributed
servingexpands
multi-node
coordination
complexityrises
Training-centricworkloads
Relativelycentralizedclusters
PAST
~10KGPUs
Chipperformancestilldominant
GPUClusterscale
InferenceExplosion
AIinfrastructureisevolvingintoacontinuouslyactive,distributedandhighlycoordinatedGPUclustersystem.
+61%CAGR
ChinaAIComputeCapacity(EFLOPS)
1,037.3
725.3
416.7
259.9
155.2
20212022202320242025
ComputeCapacityinChina
20%
•AsChina’sAIinfrastructurescalesrapidly,domesticacceleratorsareincreasinglydeployedalongside
NVIDIAGPUs,furtherincreasinginfrastructureheterogeneityandsystemcoordinationcomplexity
China-developedAIacceleratorsasaShare
•RapidgrowthinAIworkloadsisdrivingacceleratedexpansionofcomputeinfracapacityacrossChina.
•AIinfrastructurescalingincreasinglyreliesonexpandingdistributedcomputecapacityratherthanincrementalchip-levelperformance
improvements
AIinfrastructurescalingisincreasinglybecomingsystemscoordinationchallengeratherhardwarescalingproblem
4
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
Multi-chipcoexistenceisbecomingastructuralcharacteristicofChina’sAIecosystem
Unliketraditionalcomputingenvironmentsbuiltaroundrelativelyunifiedhardwareecosystems,China’sAIinfrastructureisincreasinglycharacterizedbythecoexistenceofmultiple
acceleratorarchitectures,softwarestacksanddeploymentenvironments.
NVIDIAecosystem
H100/H200/B200
A100/B800/L40s
LLM
NCCL
vLLM
CUDA
cuDNN
Tensor
RT
SoftwareStack
CUDA-basedinfrastructureprovides
relativelystandardizedruntimes,operatorsanddeploymentworkflowsacrosstrainingandinferenceenvironments.
100%
UtilizationRate
China’sDomestic
acceleratorecosystems
Lackofaunifiedsoftwarestack
Differentacceleratorvendorsoftenrelyondistinctruntimes,compilersandoperatoradaptationframeworks,increasingcross-platformdeploymentcomplexity.
30%
UtilizationRate
SupplychaindiversificationWorkload-specificoptimizationCostanddeploymentflexibility
•Exportrestrictionsanddomesticecosystemdevelopmentare
acceleratingmulti-chip
deploymentacrossChina’sAIinfrastructure.
•Differentacceleratorsare
increasinglyoptimizedfor
differentAIworkloads,driving
long-termmulti-chipcoexistence.
•Enterprisesareadoptingmixed-
chipdeploymentstrategiesto
improvecostefficiencyand
infrastructureflexibility.
HeterogenousInfrastructureReality
OtherDomesticMixedDeployment
AcceleratorsAccelerators
…
…
InferenceTraining
NVIDIAGPUs
5
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
Three-LayerProblemFramework:Heterogeneity+Efficiency+SLA
AsChina’sAIindustrycontinuestoevolve,itscomputeecosystemisshapedbytheparallel
presenceofinternationalGPUslikeNVIDIA,alongsideagrowingrangeofdomesticAIchips.
Ratherthanconvergingtowardasingledominantplatform,thisdual-trackdevelopmenthas
ledtoincreasingfragmentationacrosschiparchitectures,instructionsetsandsoftwarestacks.
10ktimes
<5%
ComputeResourceManagement
ModelAdaptationDifference
TheCoreProblem
Diversechipassetsandfragmented
softwarestackspreventunifiedcomputepooling,schedulingandmodelportability,leadingtorepeatedadaptation,low
utilizationandhighdeploymentcomplexity.
FragmentedComputeResourceManagement
1
TeamBChipsStackBSolutionB
TeamAChipsStackASolutionA
TeamCChipsStackCSolutionC
Nounifiedpooling/scheduling
Computeresourcesarefragmentedacrossmultiplechiparchitectures,vendorsandmanagement
systems,preventingunifiedpoolingandscheduling.
•Multiplechiparchitecturesandvendors(NVIDIA,Ascend,Cambricon,etc.)
•Independentmanagementtoolsandplatforms
•Lackofunifiedcontrolplane
<5%utilizationduetofragmentedresourcepools
2
ModelAdaptionCycle(weeks)
2-8
#ofAvailableModels
2,000,000
<1
100+
口NVIDIA口Domestic(Average)
MassivegapinLonger
availablemodelsadaptationcycles
FragmentedModelAdaptation
Modelsaretypicallydevelopedinunifiedframeworks
(e.g.,PyTorch),butmustberepeatedlyadaptedacrossdifferentchipecosystemsduetofragmentedsoftwarestacks.
•NVIDIA:Builtonasingledominantarchitecturewithalong-establishedsoftwareecosystem.
•Domestic:Multiplevendorsindependently
developingsoftwarestackswithnounifiedstandard.
6
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
Three-LayerProblemFramework:Heterogeneity+Efficiency+SLA
Inreal-worldscenarios,metricssuchaslatency,concurrencyandSLAcompliancedetermine
whetherAIsystemscanbedeployedatscale.WhilemodelsmaytechnicallyrunonChina-
developedAIaccelerators,theyoftenfailtomeettheseproduction-levelthresholds.Asa
result,workloadscannotbeconsolidated,leadingtofragmenteddeployment,lowutilization
andhighcost.
3
PoorlyOrchestratedHeterogeneousComputeUnstable,InefficientandUneconomical
SLAofEnterpriseAIWorkload
Higher
Concurrency
Higher
Availability
HigherStability
Lower
Latency
TaskA
Q&A
•200req/s
•2ktokens
TaskB
Code
•150req/s
•4ktokens
TaskC
Vision
•100req/s
•1.5ktokens
TaskD
Analytics
•120req/s
•3ktokens
Workloadsaredynamicandunevenbothacrosstasksandovertime
*Allworkloadsusethesamemodels
NVIDIASTACK(Unified)
Latency
SLA
Concurrency
GPU
Utilization
TaskA
~1.0sec
99.5%
200+
~70%
TaskB
~1.1sec
99%
150+
~65%
TaskC
~1.2sec
99%
100+
~60%
TaskD
~1.0sec
99%
120+
~55%
UnifiedStack→
MeetsSLA,HighUtilization
Heterogeneous(Unmanaged)
Latency
SLA
Concurrency
GPU
Utilization
TaskA
~3.2sec
99.5%
40+
<5%
TaskB
~3.5sec
99%
30+
<5%
TaskC
~3.0sec
99%
15+
<5%
TaskD
~3.6sec
99%
25+
<5%
UnmanagedStack→
FailsSLA,LowUtilization
Inferenceworkloadsmakeorchestrationofheterogeneouscomputeastructuralnecessity.
7
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
FromResourceManagementtoSystem-LevelCoordination
Workload1
Workload2
WorkloadN
Isolatedpools
Resourcesaretightlyboundtospecifichardware
Allocationisstatic,utilizationislowandfragmentedNoend-to-endcoordinationhardtomeetSLA
FromHardwareManagementToComputeEconomics
Workload1Workload2Workload3Workload4
SDCOrchestrationLayer
(UnifiedControlPlane)
ResourceAbstractionLayer
(DecouplesWorkloadsfromhardware)
HeterogeneousComputeResources
Unifiedorchestration
En-to-endcoordinationtoconsistentlymeetingSLA
Workload-centric
Focusonworkloadrequirementsandbusinessoutcomes.
Dynamicallocation
Resourcesareelastically
allocatedbasedonreal-time
Hardwareabstraction
Workloadsaredecoupledfromunderlyinghardwaredifferences
SDCCorePrinciples
•
•
•
Decoupling:DecoupleAIWorkloadsfromtheconstraintsofspecifichardwareenablingflexibledeployment;
Abstraction:Abstractheterogeneouscomputeresourcesintoaunifiedpool,simplifyingmanagement.
Orchestration:Manageandoptimizecomputeasaunified,software-definedsystemresourcetodeliverperformanceandSLA.
8
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
vGPUandModelHubasComplementaryLayersForming
IntegratedAIInfrastructureOrchestration
vGPU
ModelHub
Modeladaptation,optimization,cross-chiptuning,performanceconsistencyandobservability———executionlayersolutions
Scheduling
StrategyExists
ResourceAllocationSuccess
≠
ExecutionStability
GPUUtilizationImprovement
≠
Business
PerformanceImprovement
GlobalVisibility
Unifiedobservabilityacrossresources
models,tasksandperformance
Model
Compatibility
SLA
Assurance
End-to-endSLAmonitoring,
predictionandguarantee
PerformanceConsistency
System
Layer
(Missing)
SLA
Enforcem
entLayer
ResourcePooling
AggregateGPUs
acrossclustersandtypes
≠
SLAGuarantee
Partitioning&Isolation
vGPUslicingandtenantisolation
Aligncomputedeliverywithbusinessgoals
andcostefficiency
Enablemodelstorunseamlesslyacross
multiplechips
Real-timefeedbackdrivenoptimization
andautoremediation
Reduceperformancevarianceacross
heterogeneouschips
ObservabilityPortability
Right-sizeallocationtomaximize
utilization
Adaptandoptimizemodelsfordifferentarchitectures
BusinessAlignment
EfficientAllocation
Monitormodel
performanceandruntimeissues
GlobalVisibility
Utilizationmonitoring
andbasichealthcheck
Closed-Loop
ResourceLayer
GPUvirtualization,slicing,pooling,isolation,quotaandadmissioncontrol——Resource
LayerSolution
9
Source:Frost&Sullivan
TheControlPlaneImperative
2WhyvGPU?
ovGPUastheCoreLayerofAIInfrastructureOrchestration
•TheMaturityHierarchyofHeterogeneousGPUOrchestration
ovGPUastheControlPlane—BeyondAbstraction
•AbstractionLayer
•OrchestrationLayer
•OptimizationLayer
•DeterministicSLAEnforcement
oCorevGPUCapabilities&HowvGPUCreatesValue
oRisevGPUplatform:HeterogeneousGPUOrchestrationatProductionScale
oDifferentiatingfromKubernetes:WhyNotJustUseK8s?
oIndustryAdoptionFromHyperscalerstoEmergingAlCloudProviders
oCompetitiveLandscape
2026AIInfrastructureOrchestrationPlatformWhitePaper
GPUHasBecomeCoreAlInfrastructure,butRawGPU
CapacityIsNottheSameasUsableCompute
AsAImovesfromexperimentstoscaleddeployment,GPUhasbecomethecorecomputeresource.However,differentworkloadsplacedifferentdemandsonGPUresourcesintermsofduration,concurrency,andlatency.Staticandsiloedmanagementmodelsincreasinglyleadtoresourcewasteanddeliveryinefficiency.
Training
•Long-durationjobswithmulti-GPUcoordination
•Highmemorycapacityandbandwidthrequirements
•Sensitivetotopologyandcommunicationefficiency
•Requiresstableresource
allocationandtaskisolation
DevelopmentTesting
•Small,intermittentjobs
•Multi-usersharingandcollaborativeaccess
•Diverseenvironmentswithfrequentchanges
•Requiresflexibleallocationandfastrecovery
Inference
•Short,high-frequencyrequests
•Fluctuatingconcurrencyandburstingtraffic
•Low-latencyandhigh-
throughputrequirements
•Requiresdynamicschedulingandelasticcapacity
AsAlworkloadsdiversify,enterprisesneednotonly
moreGPUs,butalsoamoreefficientwaytopool,slice,schedule,andoperateGPUresources.
DifferentAIWorkloadsPlaceDifferentiatedDemandsonGPUResources
Workload
LargeModelTrainingFine-tuning
InferenceServices
<I>Development/Testing
ResourceUsagePattern
Long-duration,multi-GPU,highcommunicationoverhead
Medium-duration,elasticdemand.
moderateresourceusage
Shorttasks,highfrequencyfluctuatingconcurrency
Smalljobs,intermittentusage,multi-
useraccess
InfrastructureRequirement
Stableallocation,topology-awarescheduling,taskisolation
Dynamicallocation
resourcereuse
Fine-grainedslicing,elasticscheduling,SLAassurance
Sharedaccess,auto-recoverycost
metering
FromowningGPUstousingGPUsefficientlyisthenextstepinAlinfrastructureevolution.
11
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
TheMaturityHierarchyofHeterogeneousGPUOrchestration
AsAIworkloadsdiversify,GPUresourcemanagementisevolvingfromfragmentedsingle-brandclusterstoadvancedheterogeneousorchestrationwithvirtualizationandisolation.
CapabilityMaturity
Tier1
Tier2
Tier3
Tier4
Phancy'sRisevGPU
Advancedheterogeneousorchestrationwithvirtualizationandcomputeabstraction
HAMivGPU
BasicvGPUenablementandlimitedorchestration
LegacyCloudServiceProviders
Single-brandclusters,staticprovisioning
ChineseNeoCloudProjects
Heterogeneousdeployment,project-basedmanagement
WesternNeoCloudProjects
Heterogeneousdeployment,isolatedresourcepools
Higher-maturitysolutionsincreasinglycombinemulti-brandsupportwithadvancedvirtualizationandorchestration.
RepresentativeApproachesbyMaturityLevel
TypicalCharacteristics
Heterogeneouspooling,
slicing,scheduling,isolation
Maturity
VirtualizationApproachMulti-brandSupport
vGPU
Phancy'sRiseHighAdvanced
HAMivGPUMediumBasic
WesternNeoCloud
vGPUenablementwithlimitedorchestrationdepth
MediumBasic
Projects
Heterogeneousdeploymentwithisolatedresourcepools
ChineseNeoCloud
Projects
BasicMedium
Heterogeneousdeploymentwithproject-levelmanagement
LegacyCloud
LowLow
ServiceProviders
Single-brandclustersandstaticprovisioning
Source:Frost&Sullivan
12
2026AIInfrastructureOrchestrationPlatformWhitePaper
vGPUastheControlPlaneofAlInfrastructure
AsheterogeneousGPUenvironmentsbecomemorecomplex,enterprisesneedmorethanhardwarevirtualization.vGPUprovidesacontrolplanethatstandardizesresources,
coordinatesworkloads,improvesutilizationandstabilizesAlservicedelivery.
Standardize→Schedule→Optimize→Stabilize
ControlLoop
1
AbstractionLayer
2
OrchestrationLayer
3
OptimizationLayer
4
DeterministicSLA
EnforcementLayer
Standardize
TransformsheterogeneousGPUsacrossvendors,architectures
andlocationsintostandardized,allocatablecomputeunits.
Coordinateswhoshoulduse
GPUresources,whenandwhere,basedonpriority,topology,real-
timeloadandresourcefit.
Schedule
InferenceService
vGPU
Scheduler
TrainingJob
BatchJob
Ad-hocJob
Improvesutilizationthrough
slicing,sharing,oversubscriptionanddynamicreuse,converting
idlecapacityintousablevalue.
Optimize
vGPU
Manager
Stabilizesservicedelivery
throughisolation,contentioncontrolandSLA-oriented
execution,makingAl
performancepredictableandauditable.
Stabilize
vGPUturnsrawGPUcapacityintoSLA-backedAlinfrastructurethroughabstraction,orchestration,optimizationanddeterministicexecution
13
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
AbstractionLayer:TurningHeterogeneousGPUsinto
StandardizedComputeUnits
TheabstractionlayeristhefoundationofthevGPUcontrolplane.ItscoremissionistotransformGPUsfromfixedphysicaldevicesintoallocatable,combinable,andschedulablestandardizedcomputeunits.Bymaskinghardwaredifferences,itcreatesaunifiedresourcelanguagefortheorchestration,optimization,anddeterministicexecutionlayersthatfollow.
2
Deterministic
1
4
3OptimizationLayer
SLAEnforcement
AbstractionLayerOrchestrationLayer
ComputeVirtualizationComputeAtom
A
InsteadofusingaGPUonlyasawholecard,vGPUbreaksrawGPUcomputeintoschedulablecomputeatomsorslices,somultipleworkloadscanconsumedifferentportionsofcomputeindependently.
Forexample,onetaskmayuse30%ofcompute,
whileanotheruses50%onthesamephysicalGPU.
Compute
GPU
Compute
Virtualization
MemoryVirtualization6GB
B
GPUmemoryisallocatedatafinergranularity,lettingworkloadsrequestmorepreciseamountsofmemoryinsteadofbeingconstrainedbycoarsefull-card
allocation.
Forexample,workloadsmayrequest6GBor10GB,
ratherthanbeinglimitedtothefullmemoryofthecard.
16GPhysical
GPUvGPU
Memory
10GB
C
Isolation
SinglePhysicalGPU
Isolation
Isolation
Isolation
Inmulti-userandmulti-workloadenvironments,
differentteamscanrunworkloadsonthesame
physicalGPUwhilecomputeandmemoryareisolatedtoavoidinterferenceandpreserveexecutionstability.
Thevalueoftheabstractionlayerisnotsimplyto'makeGPUssmaller',butto
translateheterogeneoushardwareintoaunifiedresourcelanguagethatpreparesthegroundfororchestration,optimization,andSLA-backedexecution
•ItdecouplesAIworkloadsfromdirectdependenceonspecifichardwareforms,allowingjobstodeclareresourceneedswithoutbindingtoaspecificphysicalcard.
•Additionalbenefitsareexploredfurtherinthelatersectionsonorchestration,optimization,andvGPUcapabilities
14
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
OrchestrationLayer:SchedulingGPUResourcesbyPriority,
Topology,LoadandResourceFit
Theorchestrationlayertransformsstaticallocationintodynamic,policy-awarecoordination.Bycombiningbusinesspriorities,topologyawareness,real-timeutilizationandresource-fitanalysis,vGPUenablesmoreefficientandpredictableworkloadplacementacrossheterogeneousinfrastructure.
Deterministic
1AbstractionLayer2OrchestrationLayer3OptimizationLayer4SLAEnforcement
APriority-AwareScheduling
WhenmultipleworkloadscompeteforthesameGPU,theschedulerallocatesresourcesaccordingto
businessimportanceandSLAsensitivity.High-
priorityinferenceormission-criticaljobsreceivemorestableplacement,whilelower-priorityjobsare
scheduledonabest-effortbasis.
PriorityQueue
GPUResources
Mission-CriticalInference
TrainingJobA
vGPU
High
Scheduler
BatchInference
Low
Ad-hocJob
BTopology-AwareScheduling
DifferentGPUsandinterconnectshavedifferent
communicationcharacteristics.Fordistributedtrainingandmulti-cardinference,theschedulerconsiders
topologyrelationshipssuchasNVLink,NVSwitch,
HCCSorXPULinksothattasksareplacedonGPUswithlowercommunicationcostandbetterlocality
NVLink/NVSwitch
HCCS/XPULink
vGPU
Scheduler
LowercommunicationcostBetterlocality
Topology-awarePlacementResult
vGPU
Scheduler
CLoad-AwareScheduling
Theschedulermonitorsreal-timeGPUandnode
utilizationandplacesnewtasksonunderutilized
resources.Inferencejobsarespreadtoimprove
availabilityandbalance,whiletraining/fine-tuningjobsarepackedtooptimizelocalityandutilization.
Placementadaptstoworkloadpatterns,minimizingfragmentationandhotspotrisks.
Real-timeGPUUtilization
GPU160%
GPU270%
GPU340%
GPU480%
Usereal-timeutilization
GPU150%
GPU2
GPU3
70%
40%
GPU4
80%
Compute
Bandwidth
Memory
vGPU
Scheduler
BestFitPlacement
Available
DResource-AwareScheduling
Schedulingdecisionsevaluatetheactualusageversus
allocatedGPUresources.Insteadofrelyingsolelyon
reservedcapacity,theschedulerconsidersreal-time
utilization,availableheadroom,andreclaimable
resourcestoplaceworkloadsonthemostsuitableGPUs.Thisapproachmaximizeseffectiveutilizationwhile
supportingbothreservedandpreemptibleresourcemodes.
TheorchestrationlayeriswhereabstractedGPUresourcesbecomebusiness-aware,topology-awareandutilization-awareschedulingdecisions.
•Bymakingglobalschedulingdecisions,theorchestrationlayerimprovesoverallresourceefficiencyandworkloadpredictability.
•AdditionalbenefitsarefurtherexploredinthelatersectionsonoptimizationandvGPUcapabilities.
15
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
OptimizationLayer:ContinuouslyImprovingResourceUtilizationand
TurningHardwareAssetsintoMeasurable,OperableDigitalAssets
TheoptimizationlayertransformsGPUmanagementfromsimplecapacityallocationtoefficiencyoptimizationThroughsub-GPUslicing,oversubscription,time-spacemultiplexingandmulti-tenantsharing,vGPUconvertsidleorunderusedhardwareintousable,measurableandoperableAIresources.
2
Deterministic
1
4
3OptimizationLayer
SLAEnforcement
AbstractionLayerOrchestrationLayer
PhysicalGPU
ASub-GPULevelSlicing
TraditionalGPUallocationisoftenbasedonwhole-cardunitsorcoarsevendor-definedprofiles.vGPU
refinesslicingdowntosmallercomputeandmemoryfragments,allowingmanylightweightinferenceor
developmentjobstoshareonephysicalGPUwhileconsumingonlywhattheyactuallyneed.
Fine-grainedslices
B
MemoryandComputeOversubscription
Virtualcapacity(exposed)
DemandA
DemandB
vGPUManager
DemandC
Morevirtualcapacitythanphysicalatanymoment
Inpractice,tasksdonotpeakatthesametime.Over
subscriptionallowsthesystemtoexposemorevirtual
capacitythanthestaticphysicallimitwhilerelyingon
runtimeschedulingtokeepactualusagewithinsafe
bounds.Thisimprovesresourceutilizationwithout
requiringallworkloadstobefullyactivesimultaneously.
CTime-slicingandSpace-slicing
Time-slicingletsmultiplejobssharethesameGPUinalternatingtimewindows,whichissuitableforburstyinteractivetasks.Space-slicingallowsmultipletaskstooccupydifferentportionsofGPUmemoryor
computeatthesametime,makingitsuitableforstableconcurrentmulti-tenantusage.
Time-slicing(alternatingintime)
Time
JobAJobBJobC
Space-slicing(concurrentinspace)
JobAJobBJobC
DMulti-tenantSharingandIsolation
Schedulingdecisionsshouldconsiderthecombinedfitofcompute,memoryandbandwidth.Insteadofrelyingonasinglemetric,theorchestrationlayermatches
workloadstoGPUsbasedonmultidimensional
resourcerequirements,improvingplacementaccuracyandreducingcontention.
Users/Tenants
PhysicalGPUFleet
TheoptimizationlayershiftsGPUmanagementfrom'capacityreservation'to'efficiency
activation’turninghardwareresourcesintomeasurable,operabledigitalassetswithhigherROI
•Alinfrastructurebenefitsfromhigherutilization,loweridlecapacityandmoreflexibleworkloadplacement.
•AdditionalbenefitsarefurtherexploredinthelatersectionsonvGPUcapabilitiesandbusinessvalue.
16
Source:Frost&Sullivan
2026AIInfrastructureOrchestrationPlatformWhitePaper
DeterministicExecutionLayer:EnsuringPredictable,
GuaranteedandAuditableAlServiceDelivery
InAlinfrastructure,successfulresourceallocationdoesnotautomaticallyguaranteestablebusinessoutcomes.EvenwhenGPUresourcesareabstracted,orchestratedandoptimized,AIworkloadsmaystillfaceperformancefluctuation,contentionandunpredictabilityunlessexecutionisstabilizedaroundSLA-orientedcontrol
Deterministic
1AbstractionLayer2OrchestrationLayer3OptimizationLayer4SLAEnforcement
A
GuaranteedComputeSupplyforCriticalWorkloads
ProtectedResourcePool
Guaranteed
ComputeSupply
ReservedforCriticalTask
OtherTasks
OtherTasks
Criticaltasksreceivestablecomputequotasandareinsulatedfrominterferencebyotherco-locatedtasks.Resourceisolationestablishesahardlowerboundforservicedelivery
B
Latency(ms)
Throughput(req/s)
PredictableLatencyandThroughput
Topology-awareplacementandpriority-aware
executionreduceunpredictabledelayspikes,allowinginferencelatencyand
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026首都机场集团校园招聘备考笔试参考题库及答案解析
- 水库水资源提质增效及生态治理项目竣工验收报告
- 2026年考研医学试题解析及答案
- 2026年疾病控制(副高)考试历年高频考点真题及答案
- 2026年广东省东莞市高职单招职业技能考试题库含答案解析
- 生态旅游度假区建设项目使用林地可行性报告
- 2026公务员新人面试题及答案
- 农业水资源高效利用项目水资源论证报告书
- 2025四季度浙商银行嘉兴分行社会招聘笔试历年典型考题及考点剖析附带答案详解
- 2025友利银行成都分行社会招聘笔试历年典型考题及考点剖析附带答案详解2套
- 重症创伤救治课件
- 金属非金属矿山主要负责人安全生产考核标准
- 档案信息管理岗位专业知识与面试技巧分享
- 严谨回复:医学期刊审稿意见的逐条解析策略
- 护理科研思维在PDAC个案管理中的实践
- 神经系统疾病编码课件
- 移动式操作平台(盘扣式)专项施工方案(品茗验算通过可套用)
- 《成人间歇性经口至食管管饲技术要求》
- 【教学评一体化】Unit 1My Dream Job 第7课时Reading for Writing公开课一等奖创新教学设计
- 2025年职业资格碳排放管理员碳排放交易员-碳排放咨询员参考题库含答案解析
- 正常分娩指南解读
评论
0/150
提交评论