2026年临床人工智能(AI)发展状况研究报告(英文版)-_第1页
2026年临床人工智能(AI)发展状况研究报告(英文版)-_第2页
2026年临床人工智能(AI)发展状况研究报告(英文版)-_第3页
2026年临床人工智能(AI)发展状况研究报告(英文版)-_第4页
2026年临床人工智能(AI)发展状况研究报告(英文版)-_第5页
已阅读5页,还剩125页未读 继续免费阅读

付费下载

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Stateof

ClinicalAIReport

2026

ARISE-AI.ORGJanuary,2026

1

AboutTheAuthors

PeterBrodeur

Dr.PeterBrodeur

isarisingcardiologyfellowatHarvardMedicalSchool’sBethIsraelDeaconessMedicalCenter.Dr.BrodeurisanaffiliateofARISE,reviewerforNatureMedicine&NEJMAI,andformerlifesciencesstrategyconsultant.HisresearchfocusesonhumancomputerinteractionandLLMclinicalreasoning.

EthanGoh

Dr.EthanGoh

istheExecutiveDirectorofARISE.HisresearchhasbeenfeaturedinTheNewYorkTimes,TheWashingtonPost,andCNN.HedirectstheStanfordHealthcareAILeadershipProgram,andHarvard’sAgenticAIExecutiveCourse.Dr.GohisaFoundingEditorialBoardmemberandAssociateEditoratBMJDigitalHealth&AI.

AdamRodman

Dr.AdamRodman

isanassistantprofessoratHarvardMedicalSchool.HeistheDirectorofAIProgramsfortheCarlJ.ShapiroCenter.Dr.RodmanisanAssociateEditoratNEJMAI.HeisalsothehostoftheAmericanCollegeofPhysicianspodcastBedsideRounds.

JonathanHChen

Dr.Jonathan

HChenisStanford’sinauguralDirectorforMedicalEducationinAIintheDivisionofComputationalMedicine.Hisexpertisecombininghumanwithartificialintelligencetoprovidebetterhealthcarethaneitheraloneisfeaturedinthepopularpresswithover100publicationsandawards.

ARISE-AI.ORG

Page2

Page#

TeamNamePage#

3

MessageFromARISELeadership

“Therearedecadeswherenothinghappens;andthereareweekswhendecades

happen.”Recentdeploymentsbytechnologycompanies,healthsystems,andregulatorshavemadeclinicalAImorevisibleandevermoreconsequential.Atthesametime,ithasbecomehardertokeepupwithemergingresearch.Insomeareastheliteratureis

fragmented;inothers,itsimplydoesn’texistyetforthewaythesetoolsarebeingusedtoday.

Sowhatactuallyholdsupinpractice?

TheStateofClinicalAIReport(2026)wascreatedtolookbeyondmodelperformancealonetoothercriticalfactorsthatdeterminereal-worldimpact:howsystemsare

evaluated,howcliniciansandAIworktogether,andwherepatientrisksstarttoappear.

FrontierAIsystemsarealreadypowerful.What’sneedednowistosafelyandeffectivelytranslatethesetoolsintoreal-worldcare.

EthanGoh,AdamRodman,JonathanHChen

Investigators,ARISENetworkARISE-AI.ORG

Page3

EngagementandEducation

Stanford

Computational

MedicineColloquia

●HealthcareAIseminarswithStanford/industryleaders

●Thursday12pmPT,free

Getweeklyinvites

StanfordHealthcareAILeadership&

StrategyProgram

●Applicationrequired.CMEandaccreditedcertificate

●May2026

Applynow

GenerativeAIandAgenticAIOnlineCourse

●Harvard/Stanfordfaculty,accreditedcertificate

●Summer2026

Getearlyaccess

ARISE-AI.ORG

Page#

4

Page4

TheCurrentLandscape

ClinicalAIIsWidelyDeployedButPoorlyEvaluated

●AIisnowembeddedacrosshealthcare:1,200+FDA-clearedtoolsand350,000+consumerappshavegenerateda$70Bmarket

1

.Onlyaminorityunderwentpeer-reviewedevaluation.

2

●Of691FDA-clearedAI/MLmedicaldevices(1995–2023),>95%wentthroughthe510(k)clearance

pathway,whichispredicatedonequivalencytoexistingdevices—manyofwhichwereapprovedonsuboptimalevidence.2

●~50%ofFDAdevicesummariesomittedstudydesign,53%lackedsamplesize,and<1%reportedpatientoutcomes.2

●95%ofdevicesummariesdidnotreportdemographicdata,and91%lackedbiasassessments,raisingconcernsaboutsafetyandequityinreal-worlduse.2

Bridgingthegapbetweenadoptionandevidencerequiressupportingclinicians,healthsystemleaders,policymakers,andthepublicininterpretingavailableresearch.

ARISE-AI.ORG

Page5

PageP#age#

Page6

Page#

TopTakeaways

1.Modelcapabilityisaccelerating,butevidenceofrealclinicalimpactremains

limited.Manystudiesshowwhatmodelscandoincontrolledsettings;what’s

increasinglyneededareprospectivestudiesthatshowmeasurableeffectsonpatientoutcomesandcaredelivery.

2.FrontierLLMmodelsshowveryunevenperformance.Theyperformextremelywelloncomplexreasoningtasks,yetbreakdownwhenuncertainty,missinginformation,orchangingcontextisintroduced.

3.Cliniciansvalueautomationwhereitreducesadministrativeandworkflowburden,buttheseusecasesremainunderstudied.Taskscliniciansmostwantsupportwithareoftenunderrepresentedincurrentbenchmarksandevaluations.

ARISE-AI.ORG

Page7

Page#

TopTakeaways

4.Patient-facingAIhassignificantpotentialtoreshapeengagementandaccess,butraisesdistinctsafetyconcerns.Directinteractionwithpatientsrequiresmuchstrongerguardrailsandscalableoversightsystemsthatdonotcurrentlyexist.

5.MultimodalclinicalAIapplicationsareapproachingpracticalusability.

Improvementsinbasemodelsareenablingapplicationsthatintegrateunstructuredtext,images,andotherclinicaldatatosupportpredictionanddecision-makinginreal-worldsettings.

6.FDAclearanceisincreasing,butnear-termclinicaladoptionwillfavornarrow,task-specificsystems.AItoolsthataretightlyscopedtospecificdomainsand

contextsaremorelikelytodemonstratevalueandbeadoptedinpractice.

ARISE-AI.ORG

Page8

Page#

Acknowledgements

Reviewers

SupportedBy

RebeccaHandler

KathleenLacar

JasonHom

KameronBlack

EricHorvitz

LiamMcCoy

LauraZwaan

DavidWu

VishnuRavi

PriyankJain

BrianHan

EmilyTat

KevinSchulman

AdrianHaimovich

Design&Accessibility

EmilyTat

TheorganizationformatofthisreportwasinspiredbyNathanBenaich’sStateofAIReport.

ARISE-AI.ORG

Page9

Page#

HowtoCiteThisReport

PeterG.Brodeur,EthanGoh,EmilyTat,LiamMcCoy,DavidWu,PriyankJain,Rebecca

Handler,JasonHom,LauraZwaan,VishnuRavi,BrianHan,KevinSchulman,KathleenLacar,KameronBlack,AdrianHaimovich,EricHorvitz,AdamRodman,JonathanH.Chen“StateofClinicalAI2026,”ARISENetwork,January2026.

ARISE-AI.ORG

Introduction

Page#

Page1010

TeamName

Page#

Page11

ExecutiveSummary

ModelPerformance

Frontierreasoningmodels(optimizedformulti-stepinferenceandchainofthought)showedmarkedimprovementonchallengingclinicalreasoningtasksagainsthumanbaselineswhilepredictionmodelscrossednewthresholdsinscalablepredictiontoenableactionableprevention.

Dominantfailuremodesincludemodelrecognitionofuncertainty,overconfidence,andpatternlearning.

Benchmarks&Evaluation

Multiplechoicebenchmarksaresaturatedandevaluationsstillunderrepresentrealclinicalwork:administrativetasks,conversationaldialogue,realpatientdata,andbias/fairness.

Newbenchmarksuites(e.g.,conversational,simulatedEHRenvironments)areforcingmodelsintomorerealistic,dynamicscenarios.

FoundationalMethods

Noveltechniquessuchasconvertingmedicaldatatotokensusedforpredictionbringsaneweraofscreeningandriskstratification.

ClinicalAIisbeingadvancedbymultiagentsystems,multimodaldiagnosticsupport,andoptimizingreasoningmodels.

ARISE-AI.ORG

Page#

Page12

ExecutiveSummary

AIinClinicalWorkflows

Acrosssettings,AIcanaugmentcliniciansonreasoninganddiagnosticinterpretationtasks.However,collaborationisn’tyetoptimized.HowcliniciansuseAIisasimportantaswhatthemodelcando.

WorkflowtoolslikeAIscribesfeeltransformative,yetobjectivegainsarestillmodest.Theadditionofdownstreamworkflowtaskswilllikelyyieldmoreproductivityandefficiencyimpact.

PatientFacingAI

Multi-turnconversationalagentsandAI-basedcoachingshowpromise,particularlyastheyareintegratedwithsmartdevicestosupportmorepersonalizedhealthassistance.

Inaspacewithcompetingvendorinterests,overtrustandunsuperviseduseraisethebarforguardrailsandforimprovingobjectivepatientoutcomes,notjustengagement.

AppliedAI&Demos

Themostimmediatetranslatableprogresscanbeseenattheindividualtask-specificlevelwithimagingremainingthedominantusecase.

WeprovideasneakpeekofthenextwaveoftoolssuchasEHRchatbots,eConsults,andmentalhealthchatbots.

ARISE-AI.ORG

Page#

Page13

Methods

OurApproachtoaTargetedReviewofClinicalAI

●Datasources&searchstrategy

。ReviewedPubMed,preprintservers(e.g.,medRxiv,arXiv)usingacombinationoftermssuchas“largelanguagemodelsinmedicine,”“AI,”“diagnosticreasoning,”

“managementreasoning,”“diagnosticerror,”“benchmarks,”and“patient-facingAI.”

。InvitedcliniciansandAIresearchersfromacademicinstitutionsandissuedanopencallforsubmissionsviasocialmedia(e.g.,

LinkedIn

)toidentifyhigh-qualitystudiesacrossthesixthemes.

●Studyselection

。Allstudiesreviewedbyauthorsandreviewersofthispresentation.

。Includedempiricalstudiesthat(1)usedanAImodel/LLMinaclinicalcontext,(2)

reportedquantitativeorqualitativeoutcomes(e.g.,diagnosticaccuracy,bias,

calibration,workflow,userperformance),and(3)determinedtobeofhighimpact.

。Excludedpurelytechnicalmodelpaperswithoutclinician-orpatient-facingevaluation,editorials,andnon-clinicalAI(e.g.,drugdiscovery,biotech).

ARISE-AI.ORG

Page14

TableofContents

ModelPerformance

Howwellmodels(trainedAI

systems)performindependently

acrosspredictionandreasoning

tasks.

FoundationalMethods

Noveltechniquesthatoptimize

clinicalAIperformanceaboveoff

theshelfmodels.

Benchmarks&Evaluations

TheevolvingmetricsthatdefineAIcompetenceinmedicine.

AIinClinicalWorkflows

HowcliniciansandAIsystems

collaborateinrealorsimulated

environments.

AppliedAI&Demos

DemonstratingAI’sdomain

specificapplicationsanduse

cases.

ARISE-AI.ORG

PatientFacingAI

HowAIengagesdirectlywith

patientstoinform,support,and

personalizetheirhealthcare.

Model

Performance

Page1515

TeamNamePage#

ModelPerformance

In2025,frontiermodelsmademajorleapsinautonomousclinicalreasoningandprediction.

●Slides18–20:Reasoningfrontiermodelsshowlargegainsinautonomousclinicalreasoningversushumans,includingonhistoricallydifficultcases.

●Slides21–22:Keyweaknessespersist:poorperformanceinuncertainty-heavyscenarios,overconfidence,andpattern-basedshortcutbehavior.

●Slides23–27:Modelscontinuetoshowpromiseforscalablepredictionacrossawidevarietyofusecasessuchaspatientdeterioration,screeningforinsulinresistance,andaging.

Overall,model-onlyevaluationsrevealthatLLMshaveachievedsuperhumancapabilityincontrolledtasksbutstillrequirestrongermetacognition,calibration,andstresstestingbeforeautonomousdeployment.

Page1616

TeamNamePage#

Page#Page1717

ModelPerformance

Prediction

ComplexReasoning

Approachingsuperhumanreasoning

●AIvsMD

LLMvsPrimaryCarePhysician

LLMasanexpertcase

discussant

●Gaps

“Noneoftheotheranswers”

Brittleoverconfidenceand

uncertainty

Inpatientdeterioration

Biologicalage

Insulinresistance

Wearabletimeseriesdatafor

diagnosisprediction

Clinicalriskcalculator

ARISE-AI.ORG

Page#

Page18

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

O1-preview/o1:ReachingSuperhumanReasoningPerformance

O1-previewando1consistentlyoutperformedoratthelevelofphysiciansacrossseveralreasoningevaluations,solvingchallengingNEJMcasesatstate-of-the-artlevels,documentingsuperiorreasoningquality,excellinginmanagement

tasks,anddiagnosingrealemergencyroomcasesadmittedtothehospital.

●OnNEJMclinicopathologicalconference(CPC)cases,themodelreached78%diagnosticaccuracyandselectedthecorrectnexttest87%ofthetime.

●o1-previewachievedaperfectscore99%ofthetimeforclinical

reasoningqualitygradedbyphysicians.ThissignificantlyoutperformedGPT-4(59%)andattendingphysicians(35%).Managementreasoningforo1-preview(86%)wasalsosuperiorcomparedtoGPT-4(42%)and

physicianswithGPT-4(41%).

●InrealEDcases,themodeloutperformedoratthelevelofboth

attendingphysiciansatthreediagnostictouchpointswith66%

exact/near-exactdiagnosesvs.48–54%forphysiciansatinitialtriage.

Brodeur,Buckley,Manrai,Rodmanetal.,ArXiv,Jul.2025

●ModernLLMsmaynowsurpassphysiciansingeneraldiagnosticandmanagementreasoningincontrolledenvironments,motivatingtheneedforprospectiveclinicaltrialsforreal-worlddeployment.

ARISE-AI.ORG

Page#

Page19

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

Google’sAMIEChatbotMatchesPCPsatMulti-VisitDiseaseManagement

Enhancedwithanewmanagement-reasoningagent,theArticulateMedicalIntelligenceExplorer(AMIE)wasnon-inferiorto21primarycarephysiciansacrossguideline-baseddecision-making,treatmentplanning,andlongitudinalcare.AMIEproducedmoreprecise,guideline-basedplans,andoutperformedphysiciansonmedication-reasoningquestions.

AMIE(gemini-based)wasdesignedasatwopartsystemwithaccesstoanagentstate(currentpatientsummary,differentialetc.):afastDialogue

AgenttocapturerelevantHPIandaslowerManagementReasoningagentusinglongcontextreasoninggroundedinclinicalguidelines.

ComparedAMIEtoPCPsacross100three-visitsimulatedscenarios

spanningcardiology,pulmonology,neurology,OBGYN/urology,andGI,eachgroundedinNICEandBMJBestPracticeguidelines.

Gradedbysubspecialists,AMIE’srecommendationsforinvestigationsandtreatmentswereconsistentlymoreprecise(Yes/No),especiallyfor

investigationsinfollow-upvisits(visit2:99%vs.84%,visit3:100%vs.

88%),andcarriedexplicitcitationstoguidelinesources.Possibilityforagenticagentstoserveasapointofcontinuityinafragmentedsystem.

Onanovelmedicationreasoning(RxQA)benchmark,AMIEoutperformedPCPsonharderquestions(asdeterminedbypharmacists)inbothclosed-andopen-bookconditions,demonstratingstrongtherapeuticreasoning.

Palepu,Schaekermannetal.,ArXiv,Mar.2025

ARISE-AI.ORG

Page#

Page20

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

AIOutperformsPhysiciansasanExpertCaseDiscussantonChallengingCases

ResearchersdevelopedDr.CaBot,anAIdiscussantbasedono3thatproduceswrittenandvideoCPC-styledifferentials.Dr.CaBotwasevaluatedonNEJMCPCsandNEJMImageChallenges,spanningtentasksthattestdifferentialdiagnosis,testingstrategies,clinicalreasoning,uncertaintyhandling,andmultimodalinterpretation.Inblindedtesting,physicianscouldnotreliablydistinguishDr.CaBotfromhumanexperts,andconsistentlyrateditsreasoninghigher.

Buckleyetal.,ArXiv,Sept.2025

●Builtfrom7,102NEJMCPCs(1923–2025)and1,021NEJMImage

Challenges,CPC-Benchcovers10reasoningtasks(DDx,testingplans,touchpoints,omission,VQA,literaturesearch,etc.).

●Amongeightfrontiermodels,o3achieved60%top-1and84%top-10accuracyonCPCdifferentialdiagnosis,outperforminga20-physicianbaseline,with98%accuracyselectingthenexttest.

●Dr.CaBot,basedono3,isapubliclyavailable(

/

)systemthatproducesbothwrittenandvideocasepresentationsthatoutperformstheoriginallypresentedexpertcasediscussant.

●ThestudyshowsthatAIisnowcapableofperformingtheentireCPCdiscussantrole,withreasoningqualityratedbetterthanhumanexperts.

ARISE-AI.ORG

Page#

Page21

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

“Noneoftheotheranswers”:AnLLMWeakness

ResearcherstestedwhetherLLMscouldtrulyreasonthroughmedicalquestionsbyreplacingthecorrectanswerinmultiplechoicequestionswith“Noneoftheotheranswers”(NOTA).Frontiermodelsshowedsignificantdropsinaccuracy,revealingthatstrongmultiplechoiceperformance,isinpart,duetopatternrecognition.

●Researchersmodified100MedQAquestionssothat

NOTAbecamethecorrectanswer,creatinga68-item

clinician-validatedtestofgenuinereasoning.Thepatternofanswershaschangedbuttheunderlyingclinical

reasoninghasnot.

●DeepSeek-R1,o3-mini,Claude3.5Sonnet,Gemini2.0Flash,GPT-4o,andLlama3.3-70BallperformedworseonNOTA-modifiedquestions.Significantdecreasesinperformancewereexhibited,rangingfrom9%to38%.

Bedi,Shahetal.,JAMANetworkOpen,Aug.2025

●Asystemthatfallsforexamplefrom81%→43%

accuracywhenapatternchangeswouldbeunsafeforautonomousclinicaluse;rigorousbenchmarksmusttestreasoning,notmemorizedanswerdistributions.

ARISE-AI.ORG

Page#

Page22

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

ScriptConcordanceTestingRevealsGapsinLLMClinicalReasoning

Astudycompared10frontiermodelsto1,500+clinicianson750ScriptConcordanceTesting(SCT)questions,whichmeasuretheabilitytoreviseclinicaldecisionswhennewinformationbecomesavailable.Modelsmatchedmedicalstudentsbutunderperformedrelativetoseasonedphysicians,revealingconsistentoverconfidenceanddifficulty

updatingdecisionsunderuncertainty.

●SCTmeasurestheabilitytorevisediagnosticormanagementjudgmentswhennewinformationarrives,acoreskillofclinicalreasoningunder

uncertainty.

●Thisstudyestablishedabenchmarkassessing750SCTitemsfrom10datasets,includingpediatrics,neurology,emergencymedicine,internalmedicine,andphysiotherapy,mostneverpreviouslypublished.

●OpenAI’so3(68%)ledperformance,followedbyGPT-4o(64%),matchingmedicalstudentsbutbelowresidentsandattendingphysicians.Many

reasoningmodelsperformedsurprisinglypoorly(e.g.,Gemini2.5:52%).

McCoy,Rodmanetal.,NEJMAI,Sept.2025

●LLMsoverusedextremeratings(+2/-2),rarelyselectedneutrality(0),andshowedmiscalibratedconfidencepatternsunlike

humanexperts,suggestingthatchain-of-thought–optimizedmodelsmayovercommitinuncertainty-richtasks.

ARISE-AI.ORG

Page#

Page23

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

PredictingInpatientDeteriorationBeforeItHappens

Researchersdevelopedadeep-learningmodelusingcontinuouswearablevitalsigndatafrom888hospitalizedmed-surgpatientstopredictclinicaldeteriorationupto8-24hoursbeforestandardEHRalerts.Themodelgeneratedmoretimelyalertsthanepisodicvitalchecksandaccuratelypredictedhardoutcomes,includingICUtransfer,cardiacarrest,and

death.

●OutsideoftheICU,inpatientvitalsignsarecheckedevery4-8hours,whichleavestimegapsofmissedopportunityfordetectingcriticalillness.

●Researcherstrainedarecurrentneuralnetworkwitha5hoursequenceofcontinuousvitalsigninputs(e.g.,HR,RR)collectedfromawearablechestdevice,withdemographicsfrom888non-ICUpatientstodetectearlydeterioration.

●Predicted9xmoreclinicalalerts(ModifiedEarlyWarningScore(MEWS)>6for>30mins)8-24hoursbeforeEHR-basedMEWSalerts,with

AUROC0.89(retrospective)andAUROC0.84-0.9(prospective).

Predicted9of11hardoutcomeevents(cardiacarrests,death)upto17hoursbeforeMEWS.

●Enablesfasterrecognitionofphysiologicdeclineandthepotentialtopreventavoidabledeteriorations.

Scheid,Zanosetal.,NatureCommunications,Jul.2025

ARISE-AI.ORG

Page#

Page24

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

PredictingBiologicalAgingatPopulationScaleUsingLargeLanguageModels

ThisstudyintroducesanLLMpromptbasedframeworkthatpredictsbiologicalagefromroutinehealthrecords,enablingscalableagingassessmentacrosspopulations.Appliedto>10millionindividualsfromsixcohorts(e.g.,UKBiobank),theLLM-derivedbiologicalageoutperformedtraditionalagingclocksinpredictingmortalityandmultipleage-related

diseases.

●UsingLLMsintheLlamaandQwenfamilies,appliedpromptlearningwithoutsupervisedlearningonagingrelatedknowledge.Afterbeingfedhealth

examinationtextreports,LLMsintegrateindividualizedclinicaldatatoinferbiologicalagewithoutpredefinedbiomarkersorlabels.

●LLM-basedbiologicalageachievedaconcordance-indexof0.76for

all-causemortality.Alsooutperformedepigeneticclocks,telomerelength,frailtyindex,andconventionalMLmodels.Thedifferencebetween

LLM-predictedageandchronologicalage(“age-gap”)wasstronglyassociatedwithall-causemortality(HR1.05).

●LLM-derivedorgan-specificbiologicalagesbetterpredictedcorrespondingorgandiseasesandenabledpotentialdiscoveryof316aging-relatedproteinbiomarkers.

●Potentialforscalableandcost-effectivepersonalizedandpopulationagingassessmentwithinterpretabilityusingchainofthoughtprompts.

Li,Dietal.,NatureMedicine,Jul.2025

ARISE-AI.ORG

Page#

Page25

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

PredictingInsulinResistanceUsingWearables+RoutineLabsatScale

Researcherspairedsmartwatch-deriveddata(Fitbit/PixelWatch)withdemographicsandroutinebloodbiomarkersto

predictinsulinresistanceusingdeepneuralnetworkmodels.Thebest-performingpracticalmodel(wearables+

demographics+commonlabs)substantiallyoutperformedsingle-sourcemodelsandmaintainedsimilarperformanceinanindependentvalidationcohort.Performancewasstrongestinhighriskgroups(obesity+sedentary).

●Currentmethodsfordetectingearlyinsulinresistancerelyonsnapshotsintime(e.g.,A1c)whichcanbeinsensitiveinearlystages.

●In1,165participants,usingaHomeostaticModelAssessmentofInsulinResistance(HOMA-IR)>2.9asgroundtruth,usingonlydemographic

variablesandwearabledata,themodelachievedanAUROC0.7.AddingfastingglucoseincreasedperformancetoAUROC0.78.

●Combiningwearables+demographics+fastingglucose+lipid/metabolic

panelsachievedAUROC=0.80,76%sensitivity,84%specificity.Performancewasbestinobese+sedentaryparticipantswith93%sensitivityand95%

adjustedspecificity(minimizesmisclassificationofinsulinsensitiveasresistant).Similarperformanceinavalidationsetof72participants.

●WhentheseinsulinresistancepredictionswereintegratedintoanLLM

coachingagent,endocrinologistsconsistentlyrateditsuperiortoabaseLLMinhead-to-headcomparisonsforpersonalization,comprehensiveness,andtrustworthiness.

Metwally,Prietoetal.,ArXiv,Apr.2025

ARISE-AI.ORG

Page#

Page26

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

AFoundationModelforWearableBehavioralDatawithIndividualLevelDiagnosticPrediction

JointEmbeddingforTimeSeries(JETS)isaself-supervisedjoint-embeddingmodeltrainedon~3millionperson-daysofreal-worldwearableandbehavioraldatafrom16,522individuals.Bylearningrobustlatentrepresentationsfromnoisy,

irregulartimeseries,JETSimprovesdownstreampredictionofdiagnosesandbiomarkerscomparedwithmultiplebaselinemodels.

●Manytimeseriesmodelsrelyondense,regularlysampled,fixed

lengthinputsthatoftenisnotcongruentwithrealworlddata.

Joint-embeddingpredictivearchitecture(JEPA-style)withmasking,learnstopredictmissingsegmentsinlatentspaceinsteadof

reconstructingrawsignals.

●Trainedon63dailyorlow-resolutionmetrics(activity,sleep,HR,

VO₂max,respiration,self-reports),covering~3Mperson-daysacross16,522users.

●OutperformedMAE,PrimeNet,andtransformerbaselinesonmanydiagnoses(e.g.,AUROCME/CFS0.81,HTN0.87))andledbiomarkerpredictiondespitesparselabels.

●JETSshowsthatafoundationmodeltrainedonmassivewearabletime-seriescanlearngeneralizablehealthrepresentationsthat

outperformexistingapproachesonrealclinicalpredictiontasks.

Xie,Ballingeretal.,OpenReview,Dec.2025

ARISE-AI.ORG

Page#

Page27

Performance

/

Benchmarks

/

Methods

/

ClinicalWorkflows

/

Patient-Facing

/

AppliedAI&Demos

AgentMD:UsingLLMAgentstoRunClinicalRiskCalculatorsforRiskPredictionatScale

Clinicalcalculatorsareimportantmedicaltoolsbutremainunderutilizedduetopoordissemination,workflowburden,andfragmentedimplementation.AgentMDisanAIagentthatreadsnotes,determineswhichcalculatorsapply,extractsinputs,andutilizesclinicalcalculators,enablingaccurateandinterpretableriskprediction.

●AgentMDautomaticallyconvertedPubMedarticlesinto2,164executableclinicalcalculators,achieving>85%accuracyonexpertqualitychecksand>90%passratesonunittesting.

●Onacontrolledbenchmark(RiskQA-requiresselectingthe

correctcalculator,computing,andinterpretation),Agent

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论