2026年弥合语言鸿沟:移动网络运营商在AI生态系统中的角色研究报告(英文版)_第1页
2026年弥合语言鸿沟:移动网络运营商在AI生态系统中的角色研究报告(英文版)_第2页
2026年弥合语言鸿沟:移动网络运营商在AI生态系统中的角色研究报告(英文版)_第3页
2026年弥合语言鸿沟:移动网络运营商在AI生态系统中的角色研究报告(英文版)_第4页
2026年弥合语言鸿沟:移动网络运营商在AI生态系统中的角色研究报告(英文版)_第5页
已阅读5页,还剩44页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

BridgingtheLanguageGap

TheRoleofMobileNetworkOperatorsinAlEcosystems

TheGSMAisaglobalorganisationunifyingthemobile

ecosystemtodiscover,developanddeliverinnovation

foundationaltopositivebusinessenvironmentsand

societalchange.Ourvisionistounlockthefullpowerofconnectivitysothatpeople,industry,andsocietythrive.Representingmobileoperatorsandorganisationsacrossthemobileecosystemandadjacentindustries,theGSMAdeliversforitsmembersacrossthreebroadpillars:

ConnectivityforGood,IndustryServicesandSolutions,andOutreach.Thisactivityincludesadvancing

policy,tacklingtoday’sbiggestsocietalchallenges,

underpinningthetechnologyandinteroperabilitythatmakemobilework,andprovidingtheworld’slargest

platformtoconvenethemobileecosystemattheMWCandM360seriesofevents.

Weinviteyoutofindoutmoreat

UkInternationalDevelopment

partnership

progressprosperity

ThismaterialhasbeenfundedbyUKInternational

DevelopmentfromtheUKgovernmentandissupportedbytheGSMAanditsmembers.Theviewsexpresseddonot

necessarilyreflecttheUKGovernment’sofficialpolicies.

GSMAEmergingTechProgramme

TheGSMA

EmergingTechprogramme

acceleratesimpact

andclimateactionbyfosteringtheadoptionofAIand

emergingtechnologiesinlow-andmiddle-incomecountries(LMICs)byworkingwithpublic,privateandthirdsector

innovatorstodevelopscalableandsustainablesolutionsthathaveinclusiveandresponsibleAIatthecore.The

EmergingTechprogrammeworkscloselywiththeGSMA

AIforImpact

initiativetodrivereal-world,impact-focusedimplementationwithtelcosinLMICs.

TogetintouchwiththeEmergingTechteam,pleaseemail:

emergingtech@

Authors:

EugénieHumeau,GSMAMobileforDevelopmentZarahUdwadia,GSMAMobileforDevelopment

Contributors:

KimberlyBrown,GSMAMobileforDevelopmentIbrahimSajid,GSMAMobileforDevelopment

MaureenImiegha,GSMAMobileforDevelopment(Marketing)

Acknowledgements:

Wewouldliketothankthemanyindividualsandorganisationsthatcontributedtothisresearch.

ThisincludesDigitalUmuganda,gheero,GIZFAIRForward–ArtificialIntelligenceforAll,IndosatOoredooHutchison,Karya,MozillaFoundation,Pindo,ReverieLanguage

TechnologiesandRAIght.ai.

Published:March2026

BridgingtheLanguageGap2

BridgingtheLanguageGap3

Contents

Definitions4

Acronymsandabbreviations5

Listoffigures,spotlightsandtables6

Executivesummary7

1.Introduction9

1.1.Thelanguagedivide11

1.2.Theimportanceofculturalandlinguisticdiversity15

1.3.Theopportunity:modelsinlocallanguages16

1.4.Researchobjectives17

2.Insightsfromtheecosystem18

2.1.Existinginitiativesandapproaches19

2.2.Challengesfacedbylocallanguageinitiatives22

2.3.Implicationsfordigitalsovereignty24

3.MobilenetworkoperatorsandlocallanguageAI25

3.1.AIadoptiontrends26

3.2.Casestudies28

1.Orange:

SupportingSenegal’scustomersinlocallanguages28

2.DialogAxiata:

CreatinginclusivedigitalservicesinSriLanka30

3.Beeline(VEONGroup):

BridgingtheAIlanguagegapinKazakhstan33

4.Indosat:

BuildingsovereignAIforIndonesia37

4.Lessonsandimplications41

4.1.Keylessonsfromthecasestudies42

4.2.PathwaysforMNOstocontributetolocallanguageAI46

4.3.Conclusion49

BridgingtheLanguageGap4

Definitions

Artificial

Intelligence(AI)

Artificialintelligence(AI)iscomprisedofwidelydifferenttechnologiesthatcanbebroadlydefinedas“self-learning,adaptivesystems.”1AIhasthecapabilitytoprocesslanguage,

solveproblems,recognisepicturesandlearnbyanalysingpatternsinlargesetsofdata.

AIsovereignty

Thecontrolandautonomyasovereignstatehasoverthedevelopment,deploymentand

governanceofallaspectsoftheAIecosystemwithinitsborders.Sometimesreferredtoas“AInationalism”.

Benchmark

InthecontextofAI,abenchmarkisastandardiseddatasetandevaluationtaskusedtomeasureandcomparetheperformanceoflanguagemodelsonspecificlanguagesortasks.Benchmarksareessentialforassessingprogress,identifyinggapsandguidingmodeldevelopment,particularlyforunderrepresentedlanguages.

Compute

Computereferstotheprocessofperformingcalculationsorcomputationsrequired

foraspecifictask,suchastraininganAImodel.Italsoencompassesthehardware

components,likechips,thatcarryoutthesecalculations,aswellastheintegratedsystemsofhardwareandsoftwareusedtoperformcomputingtasks.2

Crowdsourcing

Crowdsourcingreferstothelarge-scalecollectionorannotationofdatathroughopen

orsemi-openparticipation,ofteninvolvingmanycontributorsperformingsmall,discretetaskssuchasrecordingspeech,transcribingaudioorvalidatingtranslations.

Digitalor

technology

sovereignty

Asovereignstate’sabilitytoshapethedigitaltransformationinaself-determinedmannerwithregardtohardware,software,servicesandcompetencies.Fordigitaltechnologies

andapplications,thismeansbeingabletodecideindependentlytowhatextentoneentersintooravoidsdependenceonprovidersandpartners.

Fine-tuning

Fine-tuningreferstotheprocessofcontinuingthetrainingofapre-existingAImodelonaspecificdatasettoadaptittoanarrowerdomain,taskorlanguage.Thisprocessadjuststhemodel’sinternalweights,allowingittospecialiseandimproveperformanceinthat

specificcontext.

Foundationmodel

Afoundationmodelisalarge,general-purposeAImodeltrainedonbroaddatasetsanddesignedtobeadaptedformultipledownstreamtasksorlanguagesthroughfine-tuningorothertechniques.Examplesincludelargemultilinguallanguagemodelsthatserveasabaseformorespecialisedapplications.

GenerativeAI(GenAI)

AtypeofAIthatinvolvesgeneratingnewdataorcontent,includingtext,imagesorvideos,basedonuserpromptsandbylearningfromexistingdatapatterns.

Languagemodel

AlanguagemodelisanAIsystemtrainedtounderstandandgeneratehumanlanguagebylearningpatternsfromlargeamountsoftextand/orspeechdata.Languagemodelscan

performtaskssuchastextgeneration,translation,summarising,speechrecognitionandansweringquestions.Theyrangefromsmall,task-specificmodels–oftenreferredtoassmalllanguagemodels(SLMs)–tolarge,general-purposemodelstrainedonmultilingualdata,referredtoaslargelanguagemodels(LLMs).

Local

language

Alocallanguagereferstoalanguagethatisspokenwithinaspecificcommunity,regionorcountry,oftendistinctfromthedominantornationallanguage.Itmayormaynotbeofficiallyrecognisedandistypicallycentraltoculturalandsocialidentity.

Local

languageAI

Inthisreport,locallanguageAIreferstoAIsystemsthataredesigned,trainedoradaptedtoworkinlocallanguages.Thisincludestoolsandmodelsthatunderstand,generateor

translatelocallanguages,makingAImoreaccessibleandrelevanttospeakersofthoselanguages.

1.DefinitionbytheInternationalTelecommunicationUnion(ITU).

2.AINowInstitute.(2023).

ComputationalPowerandAI

.

BridgingtheLanguageGap5

Low-resourcelanguage

Alow-resourcelanguageisonethathaslimitedornorepresentationinAIresearch,

datasetsanddigitalproducts,incontrastwith“high-resource”languages,whichare

wellrepresentedinAIsystems.Theselanguagesoftenlacksufficienttextorspeech

data,evaluationbenchmarksandothercomputationalresources.Insomecases,existingmaterialsmaybefragmented,inaccessibleorinunusableformats.Inthisreport,weusetheterms“low-resource”and“underrepresented”interchangeably.

Machine

learning(ML)

AsubfieldofAIbroadlydefinedasthecapabilityofamachinetoimitateintelligenthumanbehaviourandlearnfromdatawithoutbeingexplicitlyprogrammed.3

Naturallanguageprocessing

(NLP)

AfieldofMLinwhichmachineslearntounderstandnaturallanguageasspokenand

writtenbyhumans,insteadofthedataandnumbersnormallyusedtoprogramcomputers.

Retrieval

augmented

generation(RAG)

RAGisanAItechniquethatcombinesalanguagemodelwithanexternalknowledgesource.Themodelretrievesrelevantinformationwhenitisqueriedandusesitto

generatemoreaccurateandinformedresponses.RAGcanserveasalightweightandmoreflexiblealternativetofine-tuning,especiallywhenworkingwithlimiteddataorchangingknowledgesources.

Acronymsandabbreviations

AI

ArtificialIntelligence

ML

MachineLearning

API

ApplicationProgrammingInterface

MNO

MobileNetworkOperator

ASR

AutomaticSpeechRecognition

MT

MachineTranslation

GPU

GraphicProcessingUnit

NLP

NaturalLanguageProcessing

HITL

Human-in-the-loop

RAG

RetrievalAugmentedGeneration

IVR

InteractiveVoiceResponse

SLM

SmallLanguageModel

LLM

LargeLanguageModel

TTS

Text-to-Speech

LMIC

Low-andMiddle-IncomeCountry

3.DefinitionbytheMITSloanSchoolofManagement,basedonthedefinitionbyAIpioneerArthurSamuel.

BridgingtheLanguageGap6

Listoffigures

Figure1:

TheAIecosystemframework

Figure2:

PredominanceofEnglishinonlinecontent

Figure3:

NumberoflivinglanguagesacrossAfricancountries

Figure4:

MapoflocallanguageAIinitiatives

Figure5:

PercentageoftelcoAIdeployments

Figure6:

DevelopmentprocessforDialog’sLLMintegration

Figure7:

KazLLMpartnershipecosystem

Figure8:

KazLLMtechstack

Figure9:

SahabatAItechstack

Listofspotlights

Spotlight1:Understandinglow-resourceandunderrepresentedlanguages

Spotlight2:TheGSMAAfricanAILanguageModelsinitiative

Listoftables

Table1:

DimensionsofAIsovereignty

Table2:

TechnicaladaptationapproachesusedinlocallanguageAIdeployments

Table3:

MNOcontributionpathwaysforlocallanguageAI

Executivesummary

BridgingtheLanguageGap7

BridgingtheLanguageGapExecutivesummary8

Languageremainsoneofthebiggestbarrierstotheequitable

developmentofartificialintelligence(AI)inlow-andmiddle-incomecountries(LMICs).Thedigitalworldisdominatedbyasmallnumberof“high-resource”languages,particularlyEnglish,withabundant

digitaldataresourcesavailable.Thevastmajorityoftheworld’s

languages,bycontrast,are“lowresource”andlackthemachine-

usabledatathatcanbeusedfortrainingnaturallanguageprocessing(NLP)models,particularlylargelanguagemodels(LLMs),which

requiremassiveamountsofdata.

Modelstrainedondatathatdoesnotrepresentthe

world’svastlinguisticandculturaldiversityarenot

accessible,relevant,reliableorimpactfulforpeople

wholivetheirlivesinlow-resourcelanguages.Thisriskswideningexistingdigitaldivideswhilealsothreateningthepreservationoflanguagesacrosstheworld.

Agrowingnumberofeffortsareaddressingthis

linguisticimbalance.Startups,innovators,researchers

andcommunitiesinLMICsarebuildingandapplying

locallyrelevantAImodels,curatingandcrowdsourcing

linguisticallydiversedatasetsandcreatingenabling

environmentsforgreaterAIlanguageinclusion.However,theseeffortsareoperationallydemanding,resource

intensiveandintroduceseveralethicalconsiderations,

particularlywhentheyinvolvecommunitycrowdsourcing.Theyalsolackthecapacitytoreachlast-mileusersat

scale,creatingdatasetsandmodelswithoutdistribution.Thesechallengesarecompoundedbylimitationsin

computeinfrastructureandsustainablefundinginLMICs.

WithinthedigitalandAIecosystem,MNOsplaya

strategicallysignificantyetnotwellunderstoodrole

inbridgingthelanguagedivide.Throughfourcase

studiesinLMICs–OrangeinSenegal,DialogAxiata

inSriLanka,Beeline(VEONGroup)inKazakhstan

andIndosatinIndonesia–thisresearchexplores

howMNOsareadvancingmoreinclusiveAIthrough

modelsinlocallanguages.Thecasestudiesrange

fromMNOsusinglanguageAIforcustomersupport

(themostcommonentrypoint)tobuildinglarge-scalenationalAIinfrastructure.Ineachofthecasestudies,theMNOsrecognisetheimportanceoflanguage

inclusioninthedigitalworldandtheopportunitytoenableitthroughAI.

ThecasestudiesshowthreeclearpathwaysforMNOstosupportlanguageinclusionand,increasingly,enablesovereignAIambitions.First,asserviceproviders

andlast-miledistributors,MNOsintegratelanguage

technologiesintoexistingcustomerservices,deliveringsupportinlocallanguageswhilecreatingreal-world

environmentsfortesting,iterationandoperational

improvement.Second,asecosystemconvenersand

bridges,MNOsleveragetheirinstitutionalpositionto

bringtogethergovernments,academiaandtechnologyprovidersinmutuallybeneficialpartnerships,aligning

incentivesaroundsharedobjectivesforlanguage

inclusion,nationalprioritiesandscale.Third,some

MNOsareemergingassovereignAIenablers,investingincompute,cloudplatformsandmodel-hosting

environmentsthatpositionlocallanguageAIaspartofbroadernationaldigitalinfrastructure.Inthispathway,MNOsdonotjustdeployAIintheirownservicesbut

providetheinfrastructurelayerthatenablesbothprivate-sectorinnovationandpublic-sectordigitaltransformation.

Thefourcasestudiesillustratethesepathwaysinpractice:

–Orange(Senegal)focusesonhybridlanguagesystemstodelivercustomersupportinWolofthroughconversationalinterfaces,including

speech-enabledchannels.

–Dialog(SriLanka)usesprompt-basedandhybridlanguagetechniquestolowerbarrierstodigital

creationforwomenentrepreneurs,withno-codeapproaches.

–Beeline(Kazakhstan)leadsamulti-stakeholder

efforttobuildKazakhlanguagemodelsanchoredinopenaccessandpublic-sectoruse.

–Indosat(Indonesia)investsinsovereigncomputeandopenlanguagemodelstosupportnationalAIcapacityacrosspublicservicesandindustry.

Takentogether,thefindingsshowthatinclusivelocallanguageAIwillnotemergefromasingleactoror

technicalapproach.Instead,progressdependson

complementaryrolesacrosstheecosystem.MNOs

aremosteffectivewhentheyfocusontheirstructuralstrengths–deployinglanguagetechnologiesat

scale,conveningpartnersand,insomecases,

providingshareddigitalandAIinfrastructure–

whilecommunity-ledinitiativescontinuetodrive

linguisticdepth,culturalgroundinganddatacreation.

Ultimately,closingtheAIlanguagegapinLMICswilldependonhoweffectivelyinstitutionsalign

incentives,shareriskandbuildpartnershipsthattranslatelinguisticinnovationintosustainableandlarge-scaleimpact.

1.Introduction

BridgingtheLanguageGap9

BridgingtheLanguageGapIntroduction10

Thepotentialofartificialintelligence(AI)tosupportsocialand

economicdevelopmentiswellestablished.AIapplicationsare

increasinglyseenascriticaltoolsforimprovingservicedelivery,

expandingaccesstoinformationandsupportinginclusivegrowth,particularlyinlow-andmiddle-incomecountries(LMICs)where

developmentneedsaremostacute.AIiswidelyregardedasageneral-purposetechnology,anditsadoptionhasbeenfasterthananypreviousdigitalinnovation.

Inlessthanthreeyears,morethan1.2billionpeople

onbroaderfoundationssuchasdigitalinfrastructure,

haveusedAI-enabledtools,outpacingtheearly

humancapitalandenablingpolicyenvironments,

adoptionofboththeinternetandsmartphones.4

aswellascross-cuttingenablersincludingfinance,

However,AIadoptionremainsdeeplyunequal.Usage

partnershipsandresearchanddevelopment.6

ratesinhigh-incomeeconomiesareroughlytwice

Weaknessesinanyoftheselayerscanlimitthe

thoseobservedinLMICs,withthegapwideningsharply

adoptionanddiminishtheimpactofAI.Limited

incountrieswithGDPpercapitabelowUSD20,000.5

availabilityofdatainlocallanguagesremainsone

Thesedisparitiesreflectnotonlydifferencesinaccess

ofthemostpersistentbarriers,affectingboththe

totechnology,butalsostructuralimbalancesinhowAI

developmentofAIsystemsandtheirrelevance,

systemsaredevelopedanddeployed.

usabilityandtrustworthinessamongendusers.

Addressinglanguageinclusionisthereforeessential

ThedevelopmentofAIdependsonthreebuilding

toensurethatAIdeliversinclusiveandlocally

blocks–data,computeandskills–whichinturnrely

Figure1:TheAIecosystemframework

relevantoutcomes.7

Cross-cuttingenablersPartnerships

Researchanddevelopment

Digitaleconomyfoundations

HumancapitalPolicyand

andskillsregulation

Financingmechanisms

Digital

infrastructure

ComputeAIskills

AIfundamentals

Data

Source:GSMAMobileforDevelopment8

4.Microsoft.(2025).

AIDiffusionReport:WhereAIismostused,developed,andbuilt

.

5.Ibid.

6.GSMA.(2024).

AIforAfrica:Usecasesdeliveringimpact

.

7.WorldBankGroup.(2025).

StrengtheningFoundations:DigitalProgressandTrendsReport2025

.

8.GSMA.(2024).

AIforAfrica:Usecasesdeliveringimpact

.

BridgingtheLanguageGapIntroduction11

1.1.Thelanguagedivide

Countrieswherelow-resourcelanguagesdominate

consistentlyshowlowerlevelsofAIadoption,

reinforcingexistingdigitaldivides.9Thischallenge

becomesmostvisibleinthedesignanddeployment

oflargelanguagemodels(LLMs).Modelssuchas

ChatGPT,LlamaandClaudearerapidlytransforming

howpeopleaccessinformation,communicateand

builddigitaltools.However,despitetheirtransformativepotential,LLMsremainlargelyinaccessibleandnot

fitforpurposeincountrieswherenon-dominant

languagesarespoken,largelyinLMICs.State-of-

the-artLLMsstillshowalargeandsystematicgapinperformancebetweenEnglishandlow-resourceandnon-Latinscriptlanguages.10

LLMsaremostlytrainedondatasetsfromhigh-incomecountries(HICs),indominantlanguageslikeEnglish,

FrenchorSpanish.Theinternet,wherealargepartoftheworld’sknowledgeisstored,servesasthesinglemostimportantdatasetfortrainingAI.Yet,morethanhalfofthiscontentisinEnglish,despiteEnglishbeingspokennativelybyjust5%oftheglobalpopulation.11,12

Theoverwhelmingmajorityoftheworld’s7,000languageslackthedata,toolsortechniquesfornaturallanguageprocessing(NLP),makingthem“low-resource”incontrasttoahandfulof“high-resource”languages,includingEnglish,French,Spanish,GermanandMandarinChinese.13

9.Microsoft.(2025).

AIDiffusionReport:WhereAIismostused,developed,andbuilt

.

10.Ahuja,S.etal.(2024).

MEGAVERSE:BenchmarkingLargeLanguageModelsAcrossLanguages,Modalities,ModelsandTasks

.MicrosoftCorporation.

11.Britannica.

“Languagesbynumberofnativespeakers–List,Top,&MostSpoken”

.Accessed10September2025.

12.CommonCrawl.

“StatisticsofCommonCrawlMonthlyArchivesbycommoncrawl”.

Accessed11September2025.

13.Ravindran,S.(20July2023).

“AIoftenmanglesAfricanlanguages.Localscientistsandvolunteersaretakingitbacktoschool”

.Science.

BridgingtheLanguageGapIntroduction12

Figure2:PredominanceofEnglishinonlinecontent

a.Distributionofleadinglanguagesspoken,2025(percentageofglobalpopulation)

514

122

43

61

NativespeakersOtherspeakers

English

MandarinChinese

Spanish

Hindi

b.GlobalURLsbylanguage,2025(percentage)

GermanJapaneseFrench

45665544321

EnglishRussianChineseSpanishUnknownOtherlanguages

c.Open-sourcedatasetsfromHuggingFacebylanguage,2024(percentage)

FrenchRussian

57533230

III

EnglishChineseSpanishOtherlanguages

d.YouTubevideosbylanguage,2022(percentage)

PortugueseArabic

218755352

II

English

HindiSpanish

RussianOtherlanguages

Source:WorldBank14

14.WorldBankGroup.(2025).

StrengtheningFoundations:DigitalProgressandTrendsReport2025

.

BridgingtheLanguageGapIntroduction13

ArecentanalysisbyMicrosoftshowedthatlow-

resourcelanguagecountriesadoptAIatrates20%lowerthanhigh-resourcelanguagecountries,eventhosewithsimilarGDPandconnectivityconditions.15Thisindicatesthatloweradoptionisnotdrivenby

incomeorinfrastructuregaps,butbylinguistic

barriers–specifically,weakermodelperformance

andhigheradaptationcostsinlanguageswithlimitedtrainingdata.16Thesedisparitiesarereflectedin

benchmarkresults.Whilestate-of-the-artLLMs

achievearound80%accuracyinEnglish,theydrop

below55%forsomelow-resourcelanguagessuchasYoruba,oneofNigeria’sthreemajorlanguagesspokenbymorethan50millionpeopleacrossAfrica.17

Spotlight1:

Understandinglow-resource

andunderrepresentedlanguages

LanguageisfoundationaltoAIsystems,andtermssuchas“local”,“indigenous”,“underrepresented”

and“lowresource”areoftenusedtodescribe

distinctbutoverlappingrealities.Locallanguages

arethoseusedineverydaycommunicationwithin

acountryorregion.Theymaybeofficialornon-

officialandcanbespokenbymillionsofpeopleorbymuchsmallercommunities.Somelanguagesarealsoindigenouslanguages,meaningthattheyare

closelytiedtoIndigenousPeoples,culturesandknowledgesystems.

Manyoftheselanguagesaredescribedas“low

resource”,“underrepresented”or“underserved”in

AIandNLPecosystems.Thesetermsdonotrefertothenumberofspeakers,buttolimitedrepresentationinAIresearch,datasetsandcommercialproducts.Alanguagecanhavetensofmillionsofspeakersandstillbelow-resourceifitlacksdigitaltext,speech

dataorevaluationbenchmarks.Bycontrast,somelanguagesspokenbyrelativelysmallpopulati

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论