2026年临床人工智能（AI）发展状况研究报告（英文版）-

上传人：加*** IP属地：北京上传时间：2026-03-11 格式：DOCX 页数：130 大小：9.15MB 积分：12 举报 版权申诉

已阅读5页，还剩125页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Stateof

ClinicalAIReport

2026

ARISE-AI.ORGJanuary,2026

AboutTheAuthors

PeterBrodeur

Dr.PeterBrodeur

isarisingcardiologyfellowatHarvardMedicalSchool’sBethIsraelDeaconessMedicalCenter.Dr.BrodeurisanaﬃliateofARISE,reviewerforNatureMedicine&NEJMAI,andformerlifesciencesstrategyconsultant.HisresearchfocusesonhumancomputerinteractionandLLMclinicalreasoning.

EthanGoh

Dr.EthanGoh

istheExecutiveDirectorofARISE.HisresearchhasbeenfeaturedinTheNewYorkTimes,TheWashingtonPost,andCNN.HedirectstheStanfordHealthcareAILeadershipProgram,andHarvard’sAgenticAIExecutiveCourse.Dr.GohisaFoundingEditorialBoardmemberandAssociateEditoratBMJDigitalHealth&AI.

AdamRodman

Dr.AdamRodman

isanassistantprofessoratHarvardMedicalSchool.HeistheDirectorofAIProgramsfortheCarlJ.ShapiroCenter.Dr.RodmanisanAssociateEditoratNEJMAI.HeisalsothehostoftheAmericanCollegeofPhysicianspodcastBedsideRounds.

JonathanHChen

Dr.Jonathan

HChenisStanford’sinauguralDirectorforMedicalEducationinAIintheDivisionofComputationalMedicine.Hisexpertisecombininghumanwithartiﬁcialintelligencetoprovidebetterhealthcarethaneitheraloneisfeaturedinthepopularpresswithover100publicationsandawards.

ARISE-AI.ORG

Page2

Page#

TeamNamePage#

MessageFromARISELeadership

“Therearedecadeswherenothinghappens;andthereareweekswhendecades

happen.”Recentdeploymentsbytechnologycompanies,healthsystems,andregulatorshavemadeclinicalAImorevisibleandevermoreconsequential.Atthesametime,ithasbecomehardertokeepupwithemergingresearch.Insomeareastheliteratureis

fragmented;inothers,itsimplydoesn’texistyetforthewaythesetoolsarebeingusedtoday.

Sowhatactuallyholdsupinpractice?

TheStateofClinicalAIReport(2026)wascreatedtolookbeyondmodelperformancealonetoothercriticalfactorsthatdeterminereal-worldimpact:howsystemsare

evaluated,howcliniciansandAIworktogether,andwherepatientrisksstarttoappear.

FrontierAIsystemsarealreadypowerful.What’sneedednowistosafelyandeﬀectivelytranslatethesetoolsintoreal-worldcare.

EthanGoh,AdamRodman,JonathanHChen

Investigators,ARISENetworkARISE-AI.ORG

Page3

EngagementandEducation

Stanford

Computational

MedicineColloquia

●HealthcareAIseminarswithStanford/industryleaders

●Thursday12pmPT,free

Getweeklyinvites

StanfordHealthcareAILeadership&

StrategyProgram

●Applicationrequired.CMEandaccreditedcertiﬁcate

●May2026

Applynow

GenerativeAIandAgenticAIOnlineCourse

●Harvard/Stanfordfaculty,accreditedcertiﬁcate

●Summer2026

Getearlyaccess

ARISE-AI.ORG

Page#

Page4

TheCurrentLandscape

ClinicalAIIsWidelyDeployedButPoorlyEvaluated

●AIisnowembeddedacrosshealthcare:1,200+FDA-clearedtoolsand350,000+consumerappshavegenerateda$70Bmarket

.Onlyaminorityunderwentpeer-reviewedevaluation.

●Of691FDA-clearedAI/MLmedicaldevices(1995–2023),>95%wentthroughthe510(k)clearance

pathway,whichispredicatedonequivalencytoexistingdevices—manyofwhichwereapprovedonsuboptimalevidence.2

●~50%ofFDAdevicesummariesomittedstudydesign,53%lackedsamplesize,and<1%reportedpatientoutcomes.2

●95%ofdevicesummariesdidnotreportdemographicdata,and91%lackedbiasassessments,raisingconcernsaboutsafetyandequityinreal-worlduse.2

Bridgingthegapbetweenadoptionandevidencerequiressupportingclinicians,healthsystemleaders,policymakers,andthepublicininterpretingavailableresearch.

ARISE-AI.ORG

Page5

PageP#age#

Page6

Page#

TopTakeaways

1.Modelcapabilityisaccelerating,butevidenceofrealclinicalimpactremains

limited.Manystudiesshowwhatmodelscandoincontrolledsettings;what’s

increasinglyneededareprospectivestudiesthatshowmeasurableeﬀectsonpatientoutcomesandcaredelivery.

2.FrontierLLMmodelsshowveryunevenperformance.Theyperformextremelywelloncomplexreasoningtasks,yetbreakdownwhenuncertainty,missinginformation,orchangingcontextisintroduced.

3.Cliniciansvalueautomationwhereitreducesadministrativeandworkﬂowburden,buttheseusecasesremainunderstudied.Taskscliniciansmostwantsupportwithareoftenunderrepresentedincurrentbenchmarksandevaluations.

ARISE-AI.ORG

Page7

Page#

TopTakeaways

4.Patient-facingAIhassigniﬁcantpotentialtoreshapeengagementandaccess,butraisesdistinctsafetyconcerns.Directinteractionwithpatientsrequiresmuchstrongerguardrailsandscalableoversightsystemsthatdonotcurrentlyexist.

5.MultimodalclinicalAIapplicationsareapproachingpracticalusability.

Improvementsinbasemodelsareenablingapplicationsthatintegrateunstructuredtext,images,andotherclinicaldatatosupportpredictionanddecision-makinginreal-worldsettings.

6.FDAclearanceisincreasing,butnear-termclinicaladoptionwillfavornarrow,task-speciﬁcsystems.AItoolsthataretightlyscopedtospeciﬁcdomainsand

contextsaremorelikelytodemonstratevalueandbeadoptedinpractice.

ARISE-AI.ORG

Page8

Page#

Acknowledgements

Reviewers

SupportedBy

RebeccaHandler

KathleenLacar

JasonHom

KameronBlack

EricHorvitz

LiamMcCoy

LauraZwaan

DavidWu

VishnuRavi

PriyankJain

BrianHan

EmilyTat

KevinSchulman

AdrianHaimovich

Design&Accessibility

EmilyTat

TheorganizationformatofthisreportwasinspiredbyNathanBenaich’sStateofAIReport.

ARISE-AI.ORG

Page9

Page#

HowtoCiteThisReport

PeterG.Brodeur,EthanGoh,EmilyTat,LiamMcCoy,DavidWu,PriyankJain,Rebecca

Handler,JasonHom,LauraZwaan,VishnuRavi,BrianHan,KevinSchulman,KathleenLacar,KameronBlack,AdrianHaimovich,EricHorvitz,AdamRodman,JonathanH.Chen“StateofClinicalAI2026,”ARISENetwork,January2026.

ARISE-AI.ORG

Introduction

Page#

Page1010

TeamName

Page#

Page11

ExecutiveSummary

ModelPerformance

●

Frontierreasoningmodels(optimizedformulti-stepinferenceandchainofthought)showedmarkedimprovementonchallengingclinicalreasoningtasksagainsthumanbaselineswhilepredictionmodelscrossednewthresholdsinscalablepredictiontoenableactionableprevention.

●

Dominantfailuremodesincludemodelrecognitionofuncertainty,overconfidence,andpatternlearning.

●

Benchmarks&Evaluation

Multiplechoicebenchmarksaresaturatedandevaluationsstillunderrepresentrealclinicalwork:administrativetasks,conversationaldialogue,realpatientdata,andbias/fairness.

●

Newbenchmarksuites(e.g.,conversational,simulatedEHRenvironments)areforcingmodelsintomorerealistic,dynamicscenarios.

FoundationalMethods

Noveltechniquessuchasconvertingmedicaldatatotokensusedforpredictionbringsaneweraofscreeningandriskstratification.

●

ClinicalAIisbeingadvancedbymultiagentsystems,multimodaldiagnosticsupport,andoptimizingreasoningmodels.

ARISE-AI.ORG

Page#

Page12

ExecutiveSummary

●

AIinClinicalWorkﬂows

Acrosssettings,AIcanaugmentcliniciansonreasoninganddiagnosticinterpretationtasks.However,collaborationisn’tyetoptimized.HowcliniciansuseAIisasimportantaswhatthemodelcando.

●

WorkflowtoolslikeAIscribesfeeltransformative,yetobjectivegainsarestillmodest.Theadditionofdownstreamworkflowtaskswilllikelyyieldmoreproductivityandefficiencyimpact.

PatientFacingAI

●

Multi-turnconversationalagentsandAI-basedcoachingshowpromise,particularlyastheyareintegratedwithsmartdevicestosupportmorepersonalizedhealthassistance.

●

Inaspacewithcompetingvendorinterests,overtrustandunsuperviseduseraisethebarforguardrailsandforimprovingobjectivepatientoutcomes,notjustengagement.

●

AppliedAI&Demos

Themostimmediatetranslatableprogresscanbeseenattheindividualtask-specificlevelwithimagingremainingthedominantusecase.

●

WeprovideasneakpeekofthenextwaveoftoolssuchasEHRchatbots,eConsults,andmentalhealthchatbots.

ARISE-AI.ORG

Page#

Page13

Methods

OurApproachtoaTargetedReviewofClinicalAI

●Datasources&searchstrategy

。ReviewedPubMed,preprintservers(e.g.,medRxiv,arXiv)usingacombinationoftermssuchas“largelanguagemodelsinmedicine,”“AI,”“diagnosticreasoning,”

“managementreasoning,”“diagnosticerror,”“benchmarks,”and“patient-facingAI.”

。InvitedcliniciansandAIresearchersfromacademicinstitutionsandissuedanopencallforsubmissionsviasocialmedia(e.g.,

)toidentifyhigh-qualitystudiesacrossthesixthemes.

●Studyselection

。Allstudiesreviewedbyauthorsandreviewersofthispresentation.

。Includedempiricalstudiesthat(1)usedanAImodel/LLMinaclinicalcontext,(2)

reportedquantitativeorqualitativeoutcomes(e.g.,diagnosticaccuracy,bias,

calibration,workﬂow,userperformance),and(3)determinedtobeofhighimpact.

。Excludedpurelytechnicalmodelpaperswithoutclinician-orpatient-facingevaluation,editorials,andnon-clinicalAI(e.g.,drugdiscovery,biotech).

ARISE-AI.ORG

Page14

TableofContents

ModelPerformance

Howwellmodels(trainedAI

systems)performindependently

acrosspredictionandreasoning

tasks.

FoundationalMethods

Noveltechniquesthatoptimize

clinicalAIperformanceaboveoﬀ

theshelfmodels.

Benchmarks&Evaluations

TheevolvingmetricsthatdeﬁneAIcompetenceinmedicine.

AIinClinicalWorkﬂows

HowcliniciansandAIsystems

collaborateinrealorsimulated

environments.

AppliedAI&Demos

DemonstratingAI’sdomain

speciﬁcapplicationsanduse

cases.

ARISE-AI.ORG

PatientFacingAI

HowAIengagesdirectlywith

patientstoinform,support,and

personalizetheirhealthcare.

Model

Performance

Page1515

TeamNamePage#

ModelPerformance

In2025,frontiermodelsmademajorleapsinautonomousclinicalreasoningandprediction.

●Slides18–20:Reasoningfrontiermodelsshowlargegainsinautonomousclinicalreasoningversushumans,includingonhistoricallydiﬃcultcases.

●Slides21–22:Keyweaknessespersist:poorperformanceinuncertainty-heavyscenarios,overconﬁdence,andpattern-basedshortcutbehavior.

●Slides23–27:Modelscontinuetoshowpromiseforscalablepredictionacrossawidevarietyofusecasessuchaspatientdeterioration,screeningforinsulinresistance,andaging.

Overall,model-onlyevaluationsrevealthatLLMshaveachievedsuperhumancapabilityincontrolledtasksbutstillrequirestrongermetacognition,calibration,andstresstestingbeforeautonomousdeployment.

Page1616

TeamNamePage#

Page#Page1717

ModelPerformance

Prediction

ComplexReasoning

●

Approachingsuperhumanreasoning

●AIvsMD

○

LLMvsPrimaryCarePhysician

○

LLMasanexpertcase

discussant

●Gaps

○

“Noneoftheotheranswers”

○

Brittleoverconﬁdenceand

uncertainty

●

Inpatientdeterioration

●

Biologicalage

●

Insulinresistance

●

Wearabletimeseriesdatafor

diagnosisprediction

●

Clinicalriskcalculator

ARISE-AI.ORG

Page#

Page18

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

O1-preview/o1:ReachingSuperhumanReasoningPerformance

O1-previewando1consistentlyoutperformedoratthelevelofphysiciansacrossseveralreasoningevaluations,solvingchallengingNEJMcasesatstate-of-the-artlevels,documentingsuperiorreasoningquality,excellinginmanagement

tasks,anddiagnosingrealemergencyroomcasesadmittedtothehospital.

●OnNEJMclinicopathologicalconference(CPC)cases,themodelreached78%diagnosticaccuracyandselectedthecorrectnexttest87%ofthetime.

●o1-previewachievedaperfectscore99%ofthetimeforclinical

reasoningqualitygradedbyphysicians.ThissigniﬁcantlyoutperformedGPT-4(59%)andattendingphysicians(35%).Managementreasoningforo1-preview(86%)wasalsosuperiorcomparedtoGPT-4(42%)and

physicianswithGPT-4(41%).

●InrealEDcases,themodeloutperformedoratthelevelofboth

attendingphysiciansatthreediagnostictouchpointswith66%

exact/near-exactdiagnosesvs.48–54%forphysiciansatinitialtriage.

Brodeur,Buckley,Manrai,Rodmanetal.,ArXiv,Jul.2025

●ModernLLMsmaynowsurpassphysiciansingeneraldiagnosticandmanagementreasoningincontrolledenvironments,motivatingtheneedforprospectiveclinicaltrialsforreal-worlddeployment.

ARISE-AI.ORG

Page#

Page19

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

Google’sAMIEChatbotMatchesPCPsatMulti-VisitDiseaseManagement

Enhancedwithanewmanagement-reasoningagent,theArticulateMedicalIntelligenceExplorer(AMIE)wasnon-inferiorto21primarycarephysiciansacrossguideline-baseddecision-making,treatmentplanning,andlongitudinalcare.AMIEproducedmoreprecise,guideline-basedplans,andoutperformedphysiciansonmedication-reasoningquestions.

●

AMIE(gemini-based)wasdesignedasatwopartsystemwithaccesstoanagentstate(currentpatientsummary,diﬀerentialetc.):afastDialogue

AgenttocapturerelevantHPIandaslowerManagementReasoningagentusinglongcontextreasoninggroundedinclinicalguidelines.

ComparedAMIEtoPCPsacross100three-visitsimulatedscenarios

spanningcardiology,pulmonology,neurology,OBGYN/urology,andGI,eachgroundedinNICEandBMJBestPracticeguidelines.

Gradedbysubspecialists,AMIE’srecommendationsforinvestigationsandtreatmentswereconsistentlymoreprecise(Yes/No),especiallyfor

investigationsinfollow-upvisits(visit2:99%vs.84%,visit3:100%vs.

88%),andcarriedexplicitcitationstoguidelinesources.Possibilityforagenticagentstoserveasapointofcontinuityinafragmentedsystem.

Onanovelmedicationreasoning(RxQA)benchmark,AMIEoutperformedPCPsonharderquestions(asdeterminedbypharmacists)inbothclosed-andopen-bookconditions,demonstratingstrongtherapeuticreasoning.

Palepu,Schaekermannetal.,ArXiv,Mar.2025

ARISE-AI.ORG

Page#

Page20

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

AIOutperformsPhysiciansasanExpertCaseDiscussantonChallengingCases

ResearchersdevelopedDr.CaBot,anAIdiscussantbasedono3thatproduceswrittenandvideoCPC-styledifferentials.Dr.CaBotwasevaluatedonNEJMCPCsandNEJMImageChallenges,spanningtentasksthattestdifferentialdiagnosis,testingstrategies,clinicalreasoning,uncertaintyhandling,andmultimodalinterpretation.Inblindedtesting,physicianscouldnotreliablydistinguishDr.CaBotfromhumanexperts,andconsistentlyrateditsreasoninghigher.

Buckleyetal.,ArXiv,Sept.2025

●Builtfrom7,102NEJMCPCs(1923–2025)and1,021NEJMImage

Challenges,CPC-Benchcovers10reasoningtasks(DDx,testingplans,touchpoints,omission,VQA,literaturesearch,etc.).

●Amongeightfrontiermodels,o3achieved60%top-1and84%top-10accuracyonCPCdiﬀerentialdiagnosis,outperforminga20-physicianbaseline,with98%accuracyselectingthenexttest.

●Dr.CaBot,basedono3,isapubliclyavailable(

)systemthatproducesbothwrittenandvideocasepresentationsthatoutperformstheoriginallypresentedexpertcasediscussant.

●ThestudyshowsthatAIisnowcapableofperformingtheentireCPCdiscussantrole,withreasoningqualityratedbetterthanhumanexperts.

ARISE-AI.ORG

Page#

Page21

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

“Noneoftheotheranswers”:AnLLMWeakness

ResearcherstestedwhetherLLMscouldtrulyreasonthroughmedicalquestionsbyreplacingthecorrectanswerinmultiplechoicequestionswith“Noneoftheotheranswers”(NOTA).Frontiermodelsshowedsignificantdropsinaccuracy,revealingthatstrongmultiplechoiceperformance,isinpart,duetopatternrecognition.

●Researchersmodiﬁed100MedQAquestionssothat

NOTAbecamethecorrectanswer,creatinga68-item

clinician-validatedtestofgenuinereasoning.Thepatternofanswershaschangedbuttheunderlyingclinical

reasoninghasnot.

●DeepSeek-R1,o3-mini,Claude3.5Sonnet,Gemini2.0Flash,GPT-4o,andLlama3.3-70BallperformedworseonNOTA-modiﬁedquestions.Signiﬁcantdecreasesinperformancewereexhibited,rangingfrom9%to38%.

Bedi,Shahetal.,JAMANetworkOpen,Aug.2025

●Asystemthatfallsforexamplefrom81%→43%

accuracywhenapatternchangeswouldbeunsafeforautonomousclinicaluse;rigorousbenchmarksmusttestreasoning,notmemorizedanswerdistributions.

ARISE-AI.ORG

Page#

Page22

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

ScriptConcordanceTestingRevealsGapsinLLMClinicalReasoning

Astudycompared10frontiermodelsto1,500+clinicianson750ScriptConcordanceTesting(SCT)questions,whichmeasuretheabilitytoreviseclinicaldecisionswhennewinformationbecomesavailable.Modelsmatchedmedicalstudentsbutunderperformedrelativetoseasonedphysicians,revealingconsistentoverconfidenceanddifficulty

updatingdecisionsunderuncertainty.

●SCTmeasurestheabilitytorevisediagnosticormanagementjudgmentswhennewinformationarrives,acoreskillofclinicalreasoningunder

uncertainty.

●Thisstudyestablishedabenchmarkassessing750SCTitemsfrom10datasets,includingpediatrics,neurology,emergencymedicine,internalmedicine,andphysiotherapy,mostneverpreviouslypublished.

●OpenAI’so3(68%)ledperformance,followedbyGPT-4o(64%),matchingmedicalstudentsbutbelowresidentsandattendingphysicians.Many

reasoningmodelsperformedsurprisinglypoorly(e.g.,Gemini2.5:52%).

McCoy,Rodmanetal.,NEJMAI,Sept.2025

●LLMsoverusedextremeratings(+2/-2),rarelyselectedneutrality(0),andshowedmiscalibratedconﬁdencepatternsunlike

humanexperts,suggestingthatchain-of-thought–optimizedmodelsmayovercommitinuncertainty-richtasks.

ARISE-AI.ORG

Page#

Page23

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

PredictingInpatientDeteriorationBeforeItHappens

Researchersdevelopedadeep-learningmodelusingcontinuouswearablevitalsigndatafrom888hospitalizedmed-surgpatientstopredictclinicaldeteriorationupto8-24hoursbeforestandardEHRalerts.Themodelgeneratedmoretimelyalertsthanepisodicvitalchecksandaccuratelypredictedhardoutcomes,includingICUtransfer,cardiacarrest,and

death.

●OutsideoftheICU,inpatientvitalsignsarecheckedevery4-8hours,whichleavestimegapsofmissedopportunityfordetectingcriticalillness.

●Researcherstrainedarecurrentneuralnetworkwitha5hoursequenceofcontinuousvitalsigninputs(e.g.,HR,RR)collectedfromawearablechestdevice,withdemographicsfrom888non-ICUpatientstodetectearlydeterioration.

●Predicted9xmoreclinicalalerts(ModiﬁedEarlyWarningScore(MEWS)>6for>30mins)8-24hoursbeforeEHR-basedMEWSalerts,with

AUROC0.89(retrospective)andAUROC0.84-0.9(prospective).

Predicted9of11hardoutcomeevents(cardiacarrests,death)upto17hoursbeforeMEWS.

●Enablesfasterrecognitionofphysiologicdeclineandthepotentialtopreventavoidabledeteriorations.

Scheid,Zanosetal.,NatureCommunications,Jul.2025

ARISE-AI.ORG

Page#

Page24

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

PredictingBiologicalAgingatPopulationScaleUsingLargeLanguageModels

ThisstudyintroducesanLLMpromptbasedframeworkthatpredictsbiologicalagefromroutinehealthrecords,enablingscalableagingassessmentacrosspopulations.Appliedto>10millionindividualsfromsixcohorts(e.g.,UKBiobank),theLLM-derivedbiologicalageoutperformedtraditionalagingclocksinpredictingmortalityandmultipleage-related

diseases.

●UsingLLMsintheLlamaandQwenfamilies,appliedpromptlearningwithoutsupervisedlearningonagingrelatedknowledge.Afterbeingfedhealth

examinationtextreports,LLMsintegrateindividualizedclinicaldatatoinferbiologicalagewithoutpredeﬁnedbiomarkersorlabels.

●LLM-basedbiologicalageachievedaconcordance-indexof0.76for

all-causemortality.Alsooutperformedepigeneticclocks,telomerelength,frailtyindex,andconventionalMLmodels.Thediﬀerencebetween

LLM-predictedageandchronologicalage(“age-gap”)wasstronglyassociatedwithall-causemortality(HR1.05).

●LLM-derivedorgan-speciﬁcbiologicalagesbetterpredictedcorrespondingorgandiseasesandenabledpotentialdiscoveryof316aging-relatedproteinbiomarkers.

●Potentialforscalableandcost-eﬀectivepersonalizedandpopulationagingassessmentwithinterpretabilityusingchainofthoughtprompts.

Li,Dietal.,NatureMedicine,Jul.2025

ARISE-AI.ORG

Page#

Page25

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

PredictingInsulinResistanceUsingWearables+RoutineLabsatScale

Researcherspairedsmartwatch-deriveddata(Fitbit/PixelWatch)withdemographicsandroutinebloodbiomarkersto

predictinsulinresistanceusingdeepneuralnetworkmodels.Thebest-performingpracticalmodel(wearables+

demographics+commonlabs)substantiallyoutperformedsingle-sourcemodelsandmaintainedsimilarperformanceinanindependentvalidationcohort.Performancewasstrongestinhighriskgroups(obesity+sedentary).

●Currentmethodsfordetectingearlyinsulinresistancerelyonsnapshotsintime(e.g.,A1c)whichcanbeinsensitiveinearlystages.

●In1,165participants,usingaHomeostaticModelAssessmentofInsulinResistance(HOMA-IR)>2.9asgroundtruth,usingonlydemographic

variablesandwearabledata,themodelachievedanAUROC0.7.AddingfastingglucoseincreasedperformancetoAUROC0.78.

●Combiningwearables+demographics+fastingglucose+lipid/metabolic

panelsachievedAUROC=0.80,76%sensitivity,84%speciﬁcity.Performancewasbestinobese+sedentaryparticipantswith93%sensitivityand95%

adjustedspeciﬁcity(minimizesmisclassiﬁcationofinsulinsensitiveasresistant).Similarperformanceinavalidationsetof72participants.

●WhentheseinsulinresistancepredictionswereintegratedintoanLLM

coachingagent,endocrinologistsconsistentlyrateditsuperiortoabaseLLMinhead-to-headcomparisonsforpersonalization,comprehensiveness,andtrustworthiness.

Metwally,Prietoetal.,ArXiv,Apr.2025

ARISE-AI.ORG

Page#

Page26

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

AFoundationModelforWearableBehavioralDatawithIndividualLevelDiagnosticPrediction

JointEmbeddingforTimeSeries(JETS)isaself-supervisedjoint-embeddingmodeltrainedon~3millionperson-daysofreal-worldwearableandbehavioraldatafrom16,522individuals.Bylearningrobustlatentrepresentationsfromnoisy,

irregulartimeseries,JETSimprovesdownstreampredictionofdiagnosesandbiomarkerscomparedwithmultiplebaselinemodels.

●Manytimeseriesmodelsrelyondense,regularlysampled,ﬁxed

lengthinputsthatoftenisnotcongruentwithrealworlddata.

Joint-embeddingpredictivearchitecture(JEPA-style)withmasking,learnstopredictmissingsegmentsinlatentspaceinsteadof

reconstructingrawsignals.

●Trainedon63dailyorlow-resolutionmetrics(activity,sleep,HR,

VO₂max,respiration,self-reports),covering~3Mperson-daysacross16,522users.

●OutperformedMAE,PrimeNet,andtransformerbaselinesonmanydiagnoses(e.g.,AUROCME/CFS0.81,HTN0.87))andledbiomarkerpredictiondespitesparselabels.

●JETSshowsthatafoundationmodeltrainedonmassivewearabletime-seriescanlearngeneralizablehealthrepresentationsthat

outperformexistingapproachesonrealclinicalpredictiontasks.

Xie,Ballingeretal.,OpenReview,Dec.2025

ARISE-AI.ORG

Page#

Page27

Performance

Benchmarks

Methods

ClinicalWorkﬂows

Patient-Facing

AppliedAI&Demos

AgentMD:UsingLLMAgentstoRunClinicalRiskCalculatorsforRiskPredictionatScale

Clinicalcalculatorsareimportantmedicaltoolsbutremainunderutilizedduetopoordissemination,workflowburden,andfragmentedimplementation.AgentMDisanAIagentthatreadsnotes,determineswhichcalculatorsapply,extractsinputs,andutilizesclinicalcalculators,enablingaccurateandinterpretableriskprediction.

●AgentMDautomaticallyconvertedPubMedarticlesinto2,164executableclinicalcalculators,achieving>85%accuracyonexpertqualitychecksand>90%passratesonunittesting.

●Onacontrolledbenchmark(RiskQA-requiresselectingthe

correctcalculator,computing,andinterpretation),Agent

人人文库> 全部分类> 办公材料 > 办公文档

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

2026年临床人工智能（AI）发展状况研究报告（英文版）-

文档简介

温馨提示

最新文档

评论

2026年临床人工智能（AI）发展状况研究报告（英文版）-

文档简介

温馨提示

最新文档

评论

相关文档