2026年人工智能基础设施管理平台白皮书 2026 AI Infrastructure Orchestration Platform White Paper_第1页
2026年人工智能基础设施管理平台白皮书 2026 AI Infrastructure Orchestration Platform White Paper_第2页
2026年人工智能基础设施管理平台白皮书 2026 AI Infrastructure Orchestration Platform White Paper_第3页
2026年人工智能基础设施管理平台白皮书 2026 AI Infrastructure Orchestration Platform White Paper_第4页
2026年人工智能基础设施管理平台白皮书 2026 AI Infrastructure Orchestration Platform White Paper_第5页
已阅读5页,还剩95页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

OrchestratingAIInfrastructureinChina:

EmergingLeadersinaFragmentedand

High-ValueMarket

——2026AIInfrastructureOrchestrationPlatform

WhitePaper

Jun,2026

LeadLeo

©2026Frost&Sullivan.

AllRightsReserved

1

FromFragmentationtoOrchestration

EnablingScalableAIinaMulti-ChipWorld

oAIWorkloadsAreScalingBeyondHardwareEfficiencyGains

oAIscalingisshiftingfromchip-levelperformancetocluster-scaleinfrastructure

oMulti-chipcoexistenceisbecomingastructuralcharacteristicofChina’sAIecosystem

oThree-LayerProblemFramework:Heterogeneity+Efficiency+SLA

oFromResourceManagementtoSystem-LevelCoordination

ovGPUandModelHubasComplementaryLayersFormingIntegratedAIInfrastructureOrchestration

2026AIInfrastructureOrchestrationPlatformWhitePaper

AIWorkloadsAreScalingBeyondHardwareEfficiencyGains

Drivenbyfoundationmodelbreakthroughsandenterpriseadoption,AIworkloadsare

expandingatapacethatexceedsimprovementsinsingle-chipperformance.

TokenConsumptionGrowth

Foundationmodelbreakthroughs

Inferenceexplosion

140times

140,000

EnterpriseAIadoption

billiontoken\100

Agent-basedworkflows

2024.12025.12

AIdemandisincreasinglydrivenbyinferenceratherthantraining,leadingtosustainedandscalablecompute

consumption.

2

024.1Non-reasoningReasoning2025.12

Reasoningvs.Non-ReasoningTokenTrends.Shareofalltokensroutedthroughreasoning-optimizedmodelshasrisensteadilysinceearly2025.Themetricreflectstheproportionofalltokensservedbyreasoningmodels,nottheshareof"reasoningtokens"withinmodeloutputs.

Thedatapointstoaclearconclusion:reasoning-orientedmodelsarebecomingthedefaultpathforrealworkloads,andtheshareoftokensflowingthroughthemisnowaleadingindicatorofhowuserswanttointeractwithAIsystems.AIenterstheinference-drivenera

ShiftTowardReasoningWorkloads

50%

3

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

AIscalingisshiftingfromchip-levelperformancetocluster-scaleinfrastructure

Asinferenceworkloadsbecomecontinuouslyactive,massivelyconcurrentandincreasinglydistributed,AIscalingisincreasinglydependentonlarge-scaleGPUclusterinfrastructure

ratherthanstandalonechipperformance.

cluster-scale

infrastructure

expansion

FUTURE

~1000KGPUs

continuously

activeinference

large-scaleAIfactories

system-level

orchestration

becomescritical

CURRENT

~100KGPUs

inferencedemandaccelerates

distributed

servingexpands

multi-node

coordination

complexityrises

Training-centricworkloads

Relativelycentralizedclusters

PAST

~10KGPUs

Chipperformancestilldominant

GPUClusterscale

InferenceExplosion

AIinfrastructureisevolvingintoacontinuouslyactive,distributedandhighlycoordinatedGPUclustersystem.

+61%CAGR

ChinaAIComputeCapacity(EFLOPS)

1,037.3

725.3

416.7

259.9

155.2

20212022202320242025

ComputeCapacityinChina

20%

•AsChina’sAIinfrastructurescalesrapidly,domesticacceleratorsareincreasinglydeployedalongside

NVIDIAGPUs,furtherincreasinginfrastructureheterogeneityandsystemcoordinationcomplexity

China-developedAIacceleratorsasaShare

•RapidgrowthinAIworkloadsisdrivingacceleratedexpansionofcomputeinfracapacityacrossChina.

•AIinfrastructurescalingincreasinglyreliesonexpandingdistributedcomputecapacityratherthanincrementalchip-levelperformance

improvements

AIinfrastructurescalingisincreasinglybecomingsystemscoordinationchallengeratherhardwarescalingproblem

4

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

Multi-chipcoexistenceisbecomingastructuralcharacteristicofChina’sAIecosystem

Unliketraditionalcomputingenvironmentsbuiltaroundrelativelyunifiedhardwareecosystems,China’sAIinfrastructureisincreasinglycharacterizedbythecoexistenceofmultiple

acceleratorarchitectures,softwarestacksanddeploymentenvironments.

NVIDIAecosystem

H100/H200/B200

A100/B800/L40s

LLM

NCCL

vLLM

CUDA

cuDNN

Tensor

RT

SoftwareStack

CUDA-basedinfrastructureprovides

relativelystandardizedruntimes,operatorsanddeploymentworkflowsacrosstrainingandinferenceenvironments.

100%

UtilizationRate

China’sDomestic

acceleratorecosystems

Lackofaunifiedsoftwarestack

Differentacceleratorvendorsoftenrelyondistinctruntimes,compilersandoperatoradaptationframeworks,increasingcross-platformdeploymentcomplexity.

30%

UtilizationRate

SupplychaindiversificationWorkload-specificoptimizationCostanddeploymentflexibility

•Exportrestrictionsanddomesticecosystemdevelopmentare

acceleratingmulti-chip

deploymentacrossChina’sAIinfrastructure.

•Differentacceleratorsare

increasinglyoptimizedfor

differentAIworkloads,driving

long-termmulti-chipcoexistence.

•Enterprisesareadoptingmixed-

chipdeploymentstrategiesto

improvecostefficiencyand

infrastructureflexibility.

HeterogenousInfrastructureReality

OtherDomesticMixedDeployment

AcceleratorsAccelerators

InferenceTraining

NVIDIAGPUs

5

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

Three-LayerProblemFramework:Heterogeneity+Efficiency+SLA

AsChina’sAIindustrycontinuestoevolve,itscomputeecosystemisshapedbytheparallel

presenceofinternationalGPUslikeNVIDIA,alongsideagrowingrangeofdomesticAIchips.

Ratherthanconvergingtowardasingledominantplatform,thisdual-trackdevelopmenthas

ledtoincreasingfragmentationacrosschiparchitectures,instructionsetsandsoftwarestacks.

10ktimes

<5%

ComputeResourceManagement

ModelAdaptationDifference

TheCoreProblem

Diversechipassetsandfragmented

softwarestackspreventunifiedcomputepooling,schedulingandmodelportability,leadingtorepeatedadaptation,low

utilizationandhighdeploymentcomplexity.

FragmentedComputeResourceManagement

1

TeamBChipsStackBSolutionB

TeamAChipsStackASolutionA

TeamCChipsStackCSolutionC

Nounifiedpooling/scheduling

Computeresourcesarefragmentedacrossmultiplechiparchitectures,vendorsandmanagement

systems,preventingunifiedpoolingandscheduling.

•Multiplechiparchitecturesandvendors(NVIDIA,Ascend,Cambricon,etc.)

•Independentmanagementtoolsandplatforms

•Lackofunifiedcontrolplane

<5%utilizationduetofragmentedresourcepools

2

ModelAdaptionCycle(weeks)

2-8

#ofAvailableModels

2,000,000

<1

100+

口NVIDIA口Domestic(Average)

MassivegapinLonger

availablemodelsadaptationcycles

FragmentedModelAdaptation

Modelsaretypicallydevelopedinunifiedframeworks

(e.g.,PyTorch),butmustberepeatedlyadaptedacrossdifferentchipecosystemsduetofragmentedsoftwarestacks.

•NVIDIA:Builtonasingledominantarchitecturewithalong-establishedsoftwareecosystem.

•Domestic:Multiplevendorsindependently

developingsoftwarestackswithnounifiedstandard.

6

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

Three-LayerProblemFramework:Heterogeneity+Efficiency+SLA

Inreal-worldscenarios,metricssuchaslatency,concurrencyandSLAcompliancedetermine

whetherAIsystemscanbedeployedatscale.WhilemodelsmaytechnicallyrunonChina-

developedAIaccelerators,theyoftenfailtomeettheseproduction-levelthresholds.Asa

result,workloadscannotbeconsolidated,leadingtofragmenteddeployment,lowutilization

andhighcost.

3

PoorlyOrchestratedHeterogeneousComputeUnstable,InefficientandUneconomical

SLAofEnterpriseAIWorkload

Higher

Concurrency

Higher

Availability

HigherStability

Lower

Latency

TaskA

Q&A

•200req/s

•2ktokens

TaskB

Code

•150req/s

•4ktokens

TaskC

Vision

•100req/s

•1.5ktokens

TaskD

Analytics

•120req/s

•3ktokens

Workloadsaredynamicandunevenbothacrosstasksandovertime

*Allworkloadsusethesamemodels

NVIDIASTACK(Unified)

Latency

SLA

Concurrency

GPU

Utilization

TaskA

~1.0sec

99.5%

200+

~70%

TaskB

~1.1sec

99%

150+

~65%

TaskC

~1.2sec

99%

100+

~60%

TaskD

~1.0sec

99%

120+

~55%

UnifiedStack→

MeetsSLA,HighUtilization

Heterogeneous(Unmanaged)

Latency

SLA

Concurrency

GPU

Utilization

TaskA

~3.2sec

99.5%

40+

<5%

TaskB

~3.5sec

99%

30+

<5%

TaskC

~3.0sec

99%

15+

<5%

TaskD

~3.6sec

99%

25+

<5%

UnmanagedStack→

FailsSLA,LowUtilization

Inferenceworkloadsmakeorchestrationofheterogeneouscomputeastructuralnecessity.

7

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

FromResourceManagementtoSystem-LevelCoordination

Workload1

Workload2

WorkloadN

Isolatedpools

Resourcesaretightlyboundtospecifichardware

Allocationisstatic,utilizationislowandfragmentedNoend-to-endcoordinationhardtomeetSLA

FromHardwareManagementToComputeEconomics

Workload1Workload2Workload3Workload4

SDCOrchestrationLayer

(UnifiedControlPlane)

ResourceAbstractionLayer

(DecouplesWorkloadsfromhardware)

HeterogeneousComputeResources

Unifiedorchestration

En-to-endcoordinationtoconsistentlymeetingSLA

Workload-centric

Focusonworkloadrequirementsandbusinessoutcomes.

Dynamicallocation

Resourcesareelastically

allocatedbasedonreal-time

Hardwareabstraction

Workloadsaredecoupledfromunderlyinghardwaredifferences

SDCCorePrinciples

Decoupling:DecoupleAIWorkloadsfromtheconstraintsofspecifichardwareenablingflexibledeployment;

Abstraction:Abstractheterogeneouscomputeresourcesintoaunifiedpool,simplifyingmanagement.

Orchestration:Manageandoptimizecomputeasaunified,software-definedsystemresourcetodeliverperformanceandSLA.

8

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

vGPUandModelHubasComplementaryLayersForming

IntegratedAIInfrastructureOrchestration

vGPU

ModelHub

Modeladaptation,optimization,cross-chiptuning,performanceconsistencyandobservability———executionlayersolutions

Scheduling

StrategyExists

ResourceAllocationSuccess

ExecutionStability

GPUUtilizationImprovement

Business

PerformanceImprovement

GlobalVisibility

Unifiedobservabilityacrossresources

models,tasksandperformance

Model

Compatibility

SLA

Assurance

End-to-endSLAmonitoring,

predictionandguarantee

PerformanceConsistency

System

Layer

(Missing)

SLA

Enforcem

entLayer

ResourcePooling

AggregateGPUs

acrossclustersandtypes

SLAGuarantee

Partitioning&Isolation

vGPUslicingandtenantisolation

Aligncomputedeliverywithbusinessgoals

andcostefficiency

Enablemodelstorunseamlesslyacross

multiplechips

Real-timefeedbackdrivenoptimization

andautoremediation

Reduceperformancevarianceacross

heterogeneouschips

ObservabilityPortability

Right-sizeallocationtomaximize

utilization

Adaptandoptimizemodelsfordifferentarchitectures

BusinessAlignment

EfficientAllocation

Monitormodel

performanceandruntimeissues

GlobalVisibility

Utilizationmonitoring

andbasichealthcheck

Closed-Loop

ResourceLayer

GPUvirtualization,slicing,pooling,isolation,quotaandadmissioncontrol——Resource

LayerSolution

9

Source:Frost&Sullivan

TheControlPlaneImperative

2WhyvGPU?

ovGPUastheCoreLayerofAIInfrastructureOrchestration

•TheMaturityHierarchyofHeterogeneousGPUOrchestration

ovGPUastheControlPlane—BeyondAbstraction

•AbstractionLayer

•OrchestrationLayer

•OptimizationLayer

•DeterministicSLAEnforcement

oCorevGPUCapabilities&HowvGPUCreatesValue

oRisevGPUplatform:HeterogeneousGPUOrchestrationatProductionScale

oDifferentiatingfromKubernetes:WhyNotJustUseK8s?

oIndustryAdoptionFromHyperscalerstoEmergingAlCloudProviders

oCompetitiveLandscape

2026AIInfrastructureOrchestrationPlatformWhitePaper

GPUHasBecomeCoreAlInfrastructure,butRawGPU

CapacityIsNottheSameasUsableCompute

AsAImovesfromexperimentstoscaleddeployment,GPUhasbecomethecorecomputeresource.However,differentworkloadsplacedifferentdemandsonGPUresourcesintermsofduration,concurrency,andlatency.Staticandsiloedmanagementmodelsincreasinglyleadtoresourcewasteanddeliveryinefficiency.

Training

•Long-durationjobswithmulti-GPUcoordination

•Highmemorycapacityandbandwidthrequirements

•Sensitivetotopologyandcommunicationefficiency

•Requiresstableresource

allocationandtaskisolation

DevelopmentTesting

•Small,intermittentjobs

•Multi-usersharingandcollaborativeaccess

•Diverseenvironmentswithfrequentchanges

•Requiresflexibleallocationandfastrecovery

Inference

•Short,high-frequencyrequests

•Fluctuatingconcurrencyandburstingtraffic

•Low-latencyandhigh-

throughputrequirements

•Requiresdynamicschedulingandelasticcapacity

AsAlworkloadsdiversify,enterprisesneednotonly

moreGPUs,butalsoamoreefficientwaytopool,slice,schedule,andoperateGPUresources.

DifferentAIWorkloadsPlaceDifferentiatedDemandsonGPUResources

Workload

LargeModelTrainingFine-tuning

InferenceServices

<I>Development/Testing

ResourceUsagePattern

Long-duration,multi-GPU,highcommunicationoverhead

Medium-duration,elasticdemand.

moderateresourceusage

Shorttasks,highfrequencyfluctuatingconcurrency

Smalljobs,intermittentusage,multi-

useraccess

InfrastructureRequirement

Stableallocation,topology-awarescheduling,taskisolation

Dynamicallocation

resourcereuse

Fine-grainedslicing,elasticscheduling,SLAassurance

Sharedaccess,auto-recoverycost

metering

FromowningGPUstousingGPUsefficientlyisthenextstepinAlinfrastructureevolution.

11

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

TheMaturityHierarchyofHeterogeneousGPUOrchestration

AsAIworkloadsdiversify,GPUresourcemanagementisevolvingfromfragmentedsingle-brandclusterstoadvancedheterogeneousorchestrationwithvirtualizationandisolation.

CapabilityMaturity

Tier1

Tier2

Tier3

Tier4

Phancy'sRisevGPU

Advancedheterogeneousorchestrationwithvirtualizationandcomputeabstraction

HAMivGPU

BasicvGPUenablementandlimitedorchestration

LegacyCloudServiceProviders

Single-brandclusters,staticprovisioning

ChineseNeoCloudProjects

Heterogeneousdeployment,project-basedmanagement

WesternNeoCloudProjects

Heterogeneousdeployment,isolatedresourcepools

Higher-maturitysolutionsincreasinglycombinemulti-brandsupportwithadvancedvirtualizationandorchestration.

RepresentativeApproachesbyMaturityLevel

TypicalCharacteristics

Heterogeneouspooling,

slicing,scheduling,isolation

Maturity

VirtualizationApproachMulti-brandSupport

vGPU

Phancy'sRiseHighAdvanced

HAMivGPUMediumBasic

WesternNeoCloud

vGPUenablementwithlimitedorchestrationdepth

MediumBasic

Projects

Heterogeneousdeploymentwithisolatedresourcepools

ChineseNeoCloud

Projects

BasicMedium

Heterogeneousdeploymentwithproject-levelmanagement

LegacyCloud

LowLow

ServiceProviders

Single-brandclustersandstaticprovisioning

Source:Frost&Sullivan

12

2026AIInfrastructureOrchestrationPlatformWhitePaper

vGPUastheControlPlaneofAlInfrastructure

AsheterogeneousGPUenvironmentsbecomemorecomplex,enterprisesneedmorethanhardwarevirtualization.vGPUprovidesacontrolplanethatstandardizesresources,

coordinatesworkloads,improvesutilizationandstabilizesAlservicedelivery.

Standardize→Schedule→Optimize→Stabilize

ControlLoop

1

AbstractionLayer

2

OrchestrationLayer

3

OptimizationLayer

4

DeterministicSLA

EnforcementLayer

Standardize

TransformsheterogeneousGPUsacrossvendors,architectures

andlocationsintostandardized,allocatablecomputeunits.

Coordinateswhoshoulduse

GPUresources,whenandwhere,basedonpriority,topology,real-

timeloadandresourcefit.

Schedule

InferenceService

vGPU

Scheduler

TrainingJob

BatchJob

Ad-hocJob

Improvesutilizationthrough

slicing,sharing,oversubscriptionanddynamicreuse,converting

idlecapacityintousablevalue.

Optimize

vGPU

Manager

Stabilizesservicedelivery

throughisolation,contentioncontrolandSLA-oriented

execution,makingAl

performancepredictableandauditable.

Stabilize

vGPUturnsrawGPUcapacityintoSLA-backedAlinfrastructurethroughabstraction,orchestration,optimizationanddeterministicexecution

13

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

AbstractionLayer:TurningHeterogeneousGPUsinto

StandardizedComputeUnits

TheabstractionlayeristhefoundationofthevGPUcontrolplane.ItscoremissionistotransformGPUsfromfixedphysicaldevicesintoallocatable,combinable,andschedulablestandardizedcomputeunits.Bymaskinghardwaredifferences,itcreatesaunifiedresourcelanguagefortheorchestration,optimization,anddeterministicexecutionlayersthatfollow.

2

Deterministic

1

4

3OptimizationLayer

SLAEnforcement

AbstractionLayerOrchestrationLayer

ComputeVirtualizationComputeAtom

A

InsteadofusingaGPUonlyasawholecard,vGPUbreaksrawGPUcomputeintoschedulablecomputeatomsorslices,somultipleworkloadscanconsumedifferentportionsofcomputeindependently.

Forexample,onetaskmayuse30%ofcompute,

whileanotheruses50%onthesamephysicalGPU.

Compute

GPU

Compute

Virtualization

MemoryVirtualization6GB

B

GPUmemoryisallocatedatafinergranularity,lettingworkloadsrequestmorepreciseamountsofmemoryinsteadofbeingconstrainedbycoarsefull-card

allocation.

Forexample,workloadsmayrequest6GBor10GB,

ratherthanbeinglimitedtothefullmemoryofthecard.

16GPhysical

GPUvGPU

Memory

10GB

C

Isolation

SinglePhysicalGPU

Isolation

Isolation

Isolation

Inmulti-userandmulti-workloadenvironments,

differentteamscanrunworkloadsonthesame

physicalGPUwhilecomputeandmemoryareisolatedtoavoidinterferenceandpreserveexecutionstability.

Thevalueoftheabstractionlayerisnotsimplyto'makeGPUssmaller',butto

translateheterogeneoushardwareintoaunifiedresourcelanguagethatpreparesthegroundfororchestration,optimization,andSLA-backedexecution

•ItdecouplesAIworkloadsfromdirectdependenceonspecifichardwareforms,allowingjobstodeclareresourceneedswithoutbindingtoaspecificphysicalcard.

•Additionalbenefitsareexploredfurtherinthelatersectionsonorchestration,optimization,andvGPUcapabilities

14

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

OrchestrationLayer:SchedulingGPUResourcesbyPriority,

Topology,LoadandResourceFit

Theorchestrationlayertransformsstaticallocationintodynamic,policy-awarecoordination.Bycombiningbusinesspriorities,topologyawareness,real-timeutilizationandresource-fitanalysis,vGPUenablesmoreefficientandpredictableworkloadplacementacrossheterogeneousinfrastructure.

Deterministic

1AbstractionLayer2OrchestrationLayer3OptimizationLayer4SLAEnforcement

APriority-AwareScheduling

WhenmultipleworkloadscompeteforthesameGPU,theschedulerallocatesresourcesaccordingto

businessimportanceandSLAsensitivity.High-

priorityinferenceormission-criticaljobsreceivemorestableplacement,whilelower-priorityjobsare

scheduledonabest-effortbasis.

PriorityQueue

GPUResources

Mission-CriticalInference

TrainingJobA

vGPU

High

Scheduler

BatchInference

Low

Ad-hocJob

BTopology-AwareScheduling

DifferentGPUsandinterconnectshavedifferent

communicationcharacteristics.Fordistributedtrainingandmulti-cardinference,theschedulerconsiders

topologyrelationshipssuchasNVLink,NVSwitch,

HCCSorXPULinksothattasksareplacedonGPUswithlowercommunicationcostandbetterlocality

NVLink/NVSwitch

HCCS/XPULink

vGPU

Scheduler

LowercommunicationcostBetterlocality

Topology-awarePlacementResult

vGPU

Scheduler

CLoad-AwareScheduling

Theschedulermonitorsreal-timeGPUandnode

utilizationandplacesnewtasksonunderutilized

resources.Inferencejobsarespreadtoimprove

availabilityandbalance,whiletraining/fine-tuningjobsarepackedtooptimizelocalityandutilization.

Placementadaptstoworkloadpatterns,minimizingfragmentationandhotspotrisks.

Real-timeGPUUtilization

GPU160%

GPU270%

GPU340%

GPU480%

Usereal-timeutilization

GPU150%

GPU2

GPU3

70%

40%

GPU4

80%

Compute

Bandwidth

Memory

vGPU

Scheduler

BestFitPlacement

Available

DResource-AwareScheduling

Schedulingdecisionsevaluatetheactualusageversus

allocatedGPUresources.Insteadofrelyingsolelyon

reservedcapacity,theschedulerconsidersreal-time

utilization,availableheadroom,andreclaimable

resourcestoplaceworkloadsonthemostsuitableGPUs.Thisapproachmaximizeseffectiveutilizationwhile

supportingbothreservedandpreemptibleresourcemodes.

TheorchestrationlayeriswhereabstractedGPUresourcesbecomebusiness-aware,topology-awareandutilization-awareschedulingdecisions.

•Bymakingglobalschedulingdecisions,theorchestrationlayerimprovesoverallresourceefficiencyandworkloadpredictability.

•AdditionalbenefitsarefurtherexploredinthelatersectionsonoptimizationandvGPUcapabilities.

15

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

OptimizationLayer:ContinuouslyImprovingResourceUtilizationand

TurningHardwareAssetsintoMeasurable,OperableDigitalAssets

TheoptimizationlayertransformsGPUmanagementfromsimplecapacityallocationtoefficiencyoptimizationThroughsub-GPUslicing,oversubscription,time-spacemultiplexingandmulti-tenantsharing,vGPUconvertsidleorunderusedhardwareintousable,measurableandoperableAIresources.

2

Deterministic

1

4

3OptimizationLayer

SLAEnforcement

AbstractionLayerOrchestrationLayer

PhysicalGPU

ASub-GPULevelSlicing

TraditionalGPUallocationisoftenbasedonwhole-cardunitsorcoarsevendor-definedprofiles.vGPU

refinesslicingdowntosmallercomputeandmemoryfragments,allowingmanylightweightinferenceor

developmentjobstoshareonephysicalGPUwhileconsumingonlywhattheyactuallyneed.

Fine-grainedslices

B

MemoryandComputeOversubscription

Virtualcapacity(exposed)

DemandA

DemandB

vGPUManager

DemandC

Morevirtualcapacitythanphysicalatanymoment

Inpractice,tasksdonotpeakatthesametime.Over

subscriptionallowsthesystemtoexposemorevirtual

capacitythanthestaticphysicallimitwhilerelyingon

runtimeschedulingtokeepactualusagewithinsafe

bounds.Thisimprovesresourceutilizationwithout

requiringallworkloadstobefullyactivesimultaneously.

CTime-slicingandSpace-slicing

Time-slicingletsmultiplejobssharethesameGPUinalternatingtimewindows,whichissuitableforburstyinteractivetasks.Space-slicingallowsmultipletaskstooccupydifferentportionsofGPUmemoryor

computeatthesametime,makingitsuitableforstableconcurrentmulti-tenantusage.

Time-slicing(alternatingintime)

Time

JobAJobBJobC

Space-slicing(concurrentinspace)

JobAJobBJobC

DMulti-tenantSharingandIsolation

Schedulingdecisionsshouldconsiderthecombinedfitofcompute,memoryandbandwidth.Insteadofrelyingonasinglemetric,theorchestrationlayermatches

workloadstoGPUsbasedonmultidimensional

resourcerequirements,improvingplacementaccuracyandreducingcontention.

Users/Tenants

PhysicalGPUFleet

TheoptimizationlayershiftsGPUmanagementfrom'capacityreservation'to'efficiency

activation’turninghardwareresourcesintomeasurable,operabledigitalassetswithhigherROI

•Alinfrastructurebenefitsfromhigherutilization,loweridlecapacityandmoreflexibleworkloadplacement.

•AdditionalbenefitsarefurtherexploredinthelatersectionsonvGPUcapabilitiesandbusinessvalue.

16

Source:Frost&Sullivan

2026AIInfrastructureOrchestrationPlatformWhitePaper

DeterministicExecutionLayer:EnsuringPredictable,

GuaranteedandAuditableAlServiceDelivery

InAlinfrastructure,successfulresourceallocationdoesnotautomaticallyguaranteestablebusinessoutcomes.EvenwhenGPUresourcesareabstracted,orchestratedandoptimized,AIworkloadsmaystillfaceperformancefluctuation,contentionandunpredictabilityunlessexecutionisstabilizedaroundSLA-orientedcontrol

Deterministic

1AbstractionLayer2OrchestrationLayer3OptimizationLayer4SLAEnforcement

A

GuaranteedComputeSupplyforCriticalWorkloads

ProtectedResourcePool

Guaranteed

ComputeSupply

ReservedforCriticalTask

OtherTasks

OtherTasks

Criticaltasksreceivestablecomputequotasandareinsulatedfrominterferencebyotherco-locatedtasks.Resourceisolationestablishesahardlowerboundforservicedelivery

B

Latency(ms)

Throughput(req/s)

PredictableLatencyandThroughput

Topology-awareplacementandpriority-aware

executionreduceunpredictabledelayspikes,allowinginferencelatencyand

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论