版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Stateof
ClinicalAIReport
2026
ARISE-AI.ORGJanuary,2026
1
AboutTheAuthors
PeterBrodeur
Dr.PeterBrodeur
isarisingcardiologyfellowatHarvardMedicalSchool’sBethIsraelDeaconessMedicalCenter.Dr.BrodeurisanaffiliateofARISE,reviewerforNatureMedicine&NEJMAI,andformerlifesciencesstrategyconsultant.HisresearchfocusesonhumancomputerinteractionandLLMclinicalreasoning.
EthanGoh
Dr.EthanGoh
istheExecutiveDirectorofARISE.HisresearchhasbeenfeaturedinTheNewYorkTimes,TheWashingtonPost,andCNN.HedirectstheStanfordHealthcareAILeadershipProgram,andHarvard’sAgenticAIExecutiveCourse.Dr.GohisaFoundingEditorialBoardmemberandAssociateEditoratBMJDigitalHealth&AI.
AdamRodman
Dr.AdamRodman
isanassistantprofessoratHarvardMedicalSchool.HeistheDirectorofAIProgramsfortheCarlJ.ShapiroCenter.Dr.RodmanisanAssociateEditoratNEJMAI.HeisalsothehostoftheAmericanCollegeofPhysicianspodcastBedsideRounds.
JonathanHChen
Dr.Jonathan
HChenisStanford’sinauguralDirectorforMedicalEducationinAIintheDivisionofComputationalMedicine.Hisexpertisecombininghumanwithartificialintelligencetoprovidebetterhealthcarethaneitheraloneisfeaturedinthepopularpresswithover100publicationsandawards.
ARISE-AI.ORG
Page2
Page#
TeamNamePage#
3
MessageFromARISELeadership
“Therearedecadeswherenothinghappens;andthereareweekswhendecades
happen.”Recentdeploymentsbytechnologycompanies,healthsystems,andregulatorshavemadeclinicalAImorevisibleandevermoreconsequential.Atthesametime,ithasbecomehardertokeepupwithemergingresearch.Insomeareastheliteratureis
fragmented;inothers,itsimplydoesn’texistyetforthewaythesetoolsarebeingusedtoday.
Sowhatactuallyholdsupinpractice?
TheStateofClinicalAIReport(2026)wascreatedtolookbeyondmodelperformancealonetoothercriticalfactorsthatdeterminereal-worldimpact:howsystemsare
evaluated,howcliniciansandAIworktogether,andwherepatientrisksstarttoappear.
FrontierAIsystemsarealreadypowerful.What’sneedednowistosafelyandeffectivelytranslatethesetoolsintoreal-worldcare.
EthanGoh,AdamRodman,JonathanHChen
Investigators,ARISENetworkARISE-AI.ORG
Page3
EngagementandEducation
Stanford
Computational
MedicineColloquia
●HealthcareAIseminarswithStanford/industryleaders
●Thursday12pmPT,free
Getweeklyinvites
StanfordHealthcareAILeadership&
StrategyProgram
●Applicationrequired.CMEandaccreditedcertificate
●May2026
Applynow
GenerativeAIandAgenticAIOnlineCourse
●Harvard/Stanfordfaculty,accreditedcertificate
●Summer2026
Getearlyaccess
ARISE-AI.ORG
Page#
4
Page4
TheCurrentLandscape
ClinicalAIIsWidelyDeployedButPoorlyEvaluated
●AIisnowembeddedacrosshealthcare:1,200+FDA-clearedtoolsand350,000+consumerappshavegenerateda$70Bmarket
1
.Onlyaminorityunderwentpeer-reviewedevaluation.
2
●Of691FDA-clearedAI/MLmedicaldevices(1995–2023),>95%wentthroughthe510(k)clearance
pathway,whichispredicatedonequivalencytoexistingdevices—manyofwhichwereapprovedonsuboptimalevidence.2
●~50%ofFDAdevicesummariesomittedstudydesign,53%lackedsamplesize,and<1%reportedpatientoutcomes.2
●95%ofdevicesummariesdidnotreportdemographicdata,and91%lackedbiasassessments,raisingconcernsaboutsafetyandequityinreal-worlduse.2
Bridgingthegapbetweenadoptionandevidencerequiressupportingclinicians,healthsystemleaders,policymakers,andthepublicininterpretingavailableresearch.
ARISE-AI.ORG
Page5
PageP#age#
Page6
Page#
TopTakeaways
1.Modelcapabilityisaccelerating,butevidenceofrealclinicalimpactremains
limited.Manystudiesshowwhatmodelscandoincontrolledsettings;what’s
increasinglyneededareprospectivestudiesthatshowmeasurableeffectsonpatientoutcomesandcaredelivery.
2.FrontierLLMmodelsshowveryunevenperformance.Theyperformextremelywelloncomplexreasoningtasks,yetbreakdownwhenuncertainty,missinginformation,orchangingcontextisintroduced.
3.Cliniciansvalueautomationwhereitreducesadministrativeandworkflowburden,buttheseusecasesremainunderstudied.Taskscliniciansmostwantsupportwithareoftenunderrepresentedincurrentbenchmarksandevaluations.
ARISE-AI.ORG
Page7
Page#
TopTakeaways
4.Patient-facingAIhassignificantpotentialtoreshapeengagementandaccess,butraisesdistinctsafetyconcerns.Directinteractionwithpatientsrequiresmuchstrongerguardrailsandscalableoversightsystemsthatdonotcurrentlyexist.
5.MultimodalclinicalAIapplicationsareapproachingpracticalusability.
Improvementsinbasemodelsareenablingapplicationsthatintegrateunstructuredtext,images,andotherclinicaldatatosupportpredictionanddecision-makinginreal-worldsettings.
6.FDAclearanceisincreasing,butnear-termclinicaladoptionwillfavornarrow,task-specificsystems.AItoolsthataretightlyscopedtospecificdomainsand
contextsaremorelikelytodemonstratevalueandbeadoptedinpractice.
ARISE-AI.ORG
Page8
Page#
Acknowledgements
Reviewers
SupportedBy
RebeccaHandler
KathleenLacar
JasonHom
KameronBlack
EricHorvitz
LiamMcCoy
LauraZwaan
DavidWu
VishnuRavi
PriyankJain
BrianHan
EmilyTat
KevinSchulman
AdrianHaimovich
Design&Accessibility
EmilyTat
TheorganizationformatofthisreportwasinspiredbyNathanBenaich’sStateofAIReport.
ARISE-AI.ORG
Page9
Page#
HowtoCiteThisReport
PeterG.Brodeur,EthanGoh,EmilyTat,LiamMcCoy,DavidWu,PriyankJain,Rebecca
Handler,JasonHom,LauraZwaan,VishnuRavi,BrianHan,KevinSchulman,KathleenLacar,KameronBlack,AdrianHaimovich,EricHorvitz,AdamRodman,JonathanH.Chen“StateofClinicalAI2026,”ARISENetwork,January2026.
ARISE-AI.ORG
Introduction
Page#
Page1010
TeamName
Page#
Page11
ExecutiveSummary
ModelPerformance
●
Frontierreasoningmodels(optimizedformulti-stepinferenceandchainofthought)showedmarkedimprovementonchallengingclinicalreasoningtasksagainsthumanbaselineswhilepredictionmodelscrossednewthresholdsinscalablepredictiontoenableactionableprevention.
●
Dominantfailuremodesincludemodelrecognitionofuncertainty,overconfidence,andpatternlearning.
●
Benchmarks&Evaluation
Multiplechoicebenchmarksaresaturatedandevaluationsstillunderrepresentrealclinicalwork:administrativetasks,conversationaldialogue,realpatientdata,andbias/fairness.
●
Newbenchmarksuites(e.g.,conversational,simulatedEHRenvironments)areforcingmodelsintomorerealistic,dynamicscenarios.
FoundationalMethods
Noveltechniquessuchasconvertingmedicaldatatotokensusedforpredictionbringsaneweraofscreeningandriskstratification.
●
●
ClinicalAIisbeingadvancedbymultiagentsystems,multimodaldiagnosticsupport,andoptimizingreasoningmodels.
ARISE-AI.ORG
Page#
Page12
ExecutiveSummary
●
AIinClinicalWorkflows
Acrosssettings,AIcanaugmentcliniciansonreasoninganddiagnosticinterpretationtasks.However,collaborationisn’tyetoptimized.HowcliniciansuseAIisasimportantaswhatthemodelcando.
●
WorkflowtoolslikeAIscribesfeeltransformative,yetobjectivegainsarestillmodest.Theadditionofdownstreamworkflowtaskswilllikelyyieldmoreproductivityandefficiencyimpact.
PatientFacingAI
●
Multi-turnconversationalagentsandAI-basedcoachingshowpromise,particularlyastheyareintegratedwithsmartdevicestosupportmorepersonalizedhealthassistance.
●
Inaspacewithcompetingvendorinterests,overtrustandunsuperviseduseraisethebarforguardrailsandforimprovingobjectivepatientoutcomes,notjustengagement.
●
AppliedAI&Demos
Themostimmediatetranslatableprogresscanbeseenattheindividualtask-specificlevelwithimagingremainingthedominantusecase.
●
WeprovideasneakpeekofthenextwaveoftoolssuchasEHRchatbots,eConsults,andmentalhealthchatbots.
ARISE-AI.ORG
Page#
Page13
Methods
OurApproachtoaTargetedReviewofClinicalAI
●Datasources&searchstrategy
。ReviewedPubMed,preprintservers(e.g.,medRxiv,arXiv)usingacombinationoftermssuchas“largelanguagemodelsinmedicine,”“AI,”“diagnosticreasoning,”
“managementreasoning,”“diagnosticerror,”“benchmarks,”and“patient-facingAI.”
。InvitedcliniciansandAIresearchersfromacademicinstitutionsandissuedanopencallforsubmissionsviasocialmedia(e.g.,
)toidentifyhigh-qualitystudiesacrossthesixthemes.
●Studyselection
。Allstudiesreviewedbyauthorsandreviewersofthispresentation.
。Includedempiricalstudiesthat(1)usedanAImodel/LLMinaclinicalcontext,(2)
reportedquantitativeorqualitativeoutcomes(e.g.,diagnosticaccuracy,bias,
calibration,workflow,userperformance),and(3)determinedtobeofhighimpact.
。Excludedpurelytechnicalmodelpaperswithoutclinician-orpatient-facingevaluation,editorials,andnon-clinicalAI(e.g.,drugdiscovery,biotech).
ARISE-AI.ORG
Page14
TableofContents
ModelPerformance
Howwellmodels(trainedAI
systems)performindependently
acrosspredictionandreasoning
tasks.
FoundationalMethods
Noveltechniquesthatoptimize
clinicalAIperformanceaboveoff
theshelfmodels.
Benchmarks&Evaluations
TheevolvingmetricsthatdefineAIcompetenceinmedicine.
AIinClinicalWorkflows
HowcliniciansandAIsystems
collaborateinrealorsimulated
environments.
AppliedAI&Demos
DemonstratingAI’sdomain
specificapplicationsanduse
cases.
ARISE-AI.ORG
PatientFacingAI
HowAIengagesdirectlywith
patientstoinform,support,and
personalizetheirhealthcare.
Model
Performance
Page1515
TeamNamePage#
ModelPerformance
In2025,frontiermodelsmademajorleapsinautonomousclinicalreasoningandprediction.
●Slides18–20:Reasoningfrontiermodelsshowlargegainsinautonomousclinicalreasoningversushumans,includingonhistoricallydifficultcases.
●Slides21–22:Keyweaknessespersist:poorperformanceinuncertainty-heavyscenarios,overconfidence,andpattern-basedshortcutbehavior.
●Slides23–27:Modelscontinuetoshowpromiseforscalablepredictionacrossawidevarietyofusecasessuchaspatientdeterioration,screeningforinsulinresistance,andaging.
Overall,model-onlyevaluationsrevealthatLLMshaveachievedsuperhumancapabilityincontrolledtasksbutstillrequirestrongermetacognition,calibration,andstresstestingbeforeautonomousdeployment.
Page1616
TeamNamePage#
Page#Page1717
ModelPerformance
Prediction
ComplexReasoning
●
Approachingsuperhumanreasoning
●AIvsMD
○
LLMvsPrimaryCarePhysician
○
LLMasanexpertcase
discussant
●Gaps
○
“Noneoftheotheranswers”
○
Brittleoverconfidenceand
uncertainty
●
Inpatientdeterioration
●
Biologicalage
●
Insulinresistance
●
Wearabletimeseriesdatafor
diagnosisprediction
●
Clinicalriskcalculator
ARISE-AI.ORG
Page#
Page18
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
O1-preview/o1:ReachingSuperhumanReasoningPerformance
O1-previewando1consistentlyoutperformedoratthelevelofphysiciansacrossseveralreasoningevaluations,solvingchallengingNEJMcasesatstate-of-the-artlevels,documentingsuperiorreasoningquality,excellinginmanagement
tasks,anddiagnosingrealemergencyroomcasesadmittedtothehospital.
●OnNEJMclinicopathologicalconference(CPC)cases,themodelreached78%diagnosticaccuracyandselectedthecorrectnexttest87%ofthetime.
●o1-previewachievedaperfectscore99%ofthetimeforclinical
reasoningqualitygradedbyphysicians.ThissignificantlyoutperformedGPT-4(59%)andattendingphysicians(35%).Managementreasoningforo1-preview(86%)wasalsosuperiorcomparedtoGPT-4(42%)and
physicianswithGPT-4(41%).
●InrealEDcases,themodeloutperformedoratthelevelofboth
attendingphysiciansatthreediagnostictouchpointswith66%
exact/near-exactdiagnosesvs.48–54%forphysiciansatinitialtriage.
Brodeur,Buckley,Manrai,Rodmanetal.,ArXiv,Jul.2025
●ModernLLMsmaynowsurpassphysiciansingeneraldiagnosticandmanagementreasoningincontrolledenvironments,motivatingtheneedforprospectiveclinicaltrialsforreal-worlddeployment.
ARISE-AI.ORG
Page#
Page19
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
Google’sAMIEChatbotMatchesPCPsatMulti-VisitDiseaseManagement
Enhancedwithanewmanagement-reasoningagent,theArticulateMedicalIntelligenceExplorer(AMIE)wasnon-inferiorto21primarycarephysiciansacrossguideline-baseddecision-making,treatmentplanning,andlongitudinalcare.AMIEproducedmoreprecise,guideline-basedplans,andoutperformedphysiciansonmedication-reasoningquestions.
●
●
●
●
AMIE(gemini-based)wasdesignedasatwopartsystemwithaccesstoanagentstate(currentpatientsummary,differentialetc.):afastDialogue
AgenttocapturerelevantHPIandaslowerManagementReasoningagentusinglongcontextreasoninggroundedinclinicalguidelines.
ComparedAMIEtoPCPsacross100three-visitsimulatedscenarios
spanningcardiology,pulmonology,neurology,OBGYN/urology,andGI,eachgroundedinNICEandBMJBestPracticeguidelines.
Gradedbysubspecialists,AMIE’srecommendationsforinvestigationsandtreatmentswereconsistentlymoreprecise(Yes/No),especiallyfor
investigationsinfollow-upvisits(visit2:99%vs.84%,visit3:100%vs.
88%),andcarriedexplicitcitationstoguidelinesources.Possibilityforagenticagentstoserveasapointofcontinuityinafragmentedsystem.
Onanovelmedicationreasoning(RxQA)benchmark,AMIEoutperformedPCPsonharderquestions(asdeterminedbypharmacists)inbothclosed-andopen-bookconditions,demonstratingstrongtherapeuticreasoning.
Palepu,Schaekermannetal.,ArXiv,Mar.2025
ARISE-AI.ORG
Page#
Page20
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
AIOutperformsPhysiciansasanExpertCaseDiscussantonChallengingCases
ResearchersdevelopedDr.CaBot,anAIdiscussantbasedono3thatproduceswrittenandvideoCPC-styledifferentials.Dr.CaBotwasevaluatedonNEJMCPCsandNEJMImageChallenges,spanningtentasksthattestdifferentialdiagnosis,testingstrategies,clinicalreasoning,uncertaintyhandling,andmultimodalinterpretation.Inblindedtesting,physicianscouldnotreliablydistinguishDr.CaBotfromhumanexperts,andconsistentlyrateditsreasoninghigher.
Buckleyetal.,ArXiv,Sept.2025
●Builtfrom7,102NEJMCPCs(1923–2025)and1,021NEJMImage
Challenges,CPC-Benchcovers10reasoningtasks(DDx,testingplans,touchpoints,omission,VQA,literaturesearch,etc.).
●Amongeightfrontiermodels,o3achieved60%top-1and84%top-10accuracyonCPCdifferentialdiagnosis,outperforminga20-physicianbaseline,with98%accuracyselectingthenexttest.
●Dr.CaBot,basedono3,isapubliclyavailable(
/
)systemthatproducesbothwrittenandvideocasepresentationsthatoutperformstheoriginallypresentedexpertcasediscussant.
●ThestudyshowsthatAIisnowcapableofperformingtheentireCPCdiscussantrole,withreasoningqualityratedbetterthanhumanexperts.
ARISE-AI.ORG
Page#
Page21
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
“Noneoftheotheranswers”:AnLLMWeakness
ResearcherstestedwhetherLLMscouldtrulyreasonthroughmedicalquestionsbyreplacingthecorrectanswerinmultiplechoicequestionswith“Noneoftheotheranswers”(NOTA).Frontiermodelsshowedsignificantdropsinaccuracy,revealingthatstrongmultiplechoiceperformance,isinpart,duetopatternrecognition.
●Researchersmodified100MedQAquestionssothat
NOTAbecamethecorrectanswer,creatinga68-item
clinician-validatedtestofgenuinereasoning.Thepatternofanswershaschangedbuttheunderlyingclinical
reasoninghasnot.
●DeepSeek-R1,o3-mini,Claude3.5Sonnet,Gemini2.0Flash,GPT-4o,andLlama3.3-70BallperformedworseonNOTA-modifiedquestions.Significantdecreasesinperformancewereexhibited,rangingfrom9%to38%.
Bedi,Shahetal.,JAMANetworkOpen,Aug.2025
●Asystemthatfallsforexamplefrom81%→43%
accuracywhenapatternchangeswouldbeunsafeforautonomousclinicaluse;rigorousbenchmarksmusttestreasoning,notmemorizedanswerdistributions.
ARISE-AI.ORG
Page#
Page22
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
ScriptConcordanceTestingRevealsGapsinLLMClinicalReasoning
Astudycompared10frontiermodelsto1,500+clinicianson750ScriptConcordanceTesting(SCT)questions,whichmeasuretheabilitytoreviseclinicaldecisionswhennewinformationbecomesavailable.Modelsmatchedmedicalstudentsbutunderperformedrelativetoseasonedphysicians,revealingconsistentoverconfidenceanddifficulty
updatingdecisionsunderuncertainty.
●SCTmeasurestheabilitytorevisediagnosticormanagementjudgmentswhennewinformationarrives,acoreskillofclinicalreasoningunder
uncertainty.
●Thisstudyestablishedabenchmarkassessing750SCTitemsfrom10datasets,includingpediatrics,neurology,emergencymedicine,internalmedicine,andphysiotherapy,mostneverpreviouslypublished.
●OpenAI’so3(68%)ledperformance,followedbyGPT-4o(64%),matchingmedicalstudentsbutbelowresidentsandattendingphysicians.Many
reasoningmodelsperformedsurprisinglypoorly(e.g.,Gemini2.5:52%).
McCoy,Rodmanetal.,NEJMAI,Sept.2025
●LLMsoverusedextremeratings(+2/-2),rarelyselectedneutrality(0),andshowedmiscalibratedconfidencepatternsunlike
humanexperts,suggestingthatchain-of-thought–optimizedmodelsmayovercommitinuncertainty-richtasks.
ARISE-AI.ORG
Page#
Page23
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
PredictingInpatientDeteriorationBeforeItHappens
Researchersdevelopedadeep-learningmodelusingcontinuouswearablevitalsigndatafrom888hospitalizedmed-surgpatientstopredictclinicaldeteriorationupto8-24hoursbeforestandardEHRalerts.Themodelgeneratedmoretimelyalertsthanepisodicvitalchecksandaccuratelypredictedhardoutcomes,includingICUtransfer,cardiacarrest,and
death.
●OutsideoftheICU,inpatientvitalsignsarecheckedevery4-8hours,whichleavestimegapsofmissedopportunityfordetectingcriticalillness.
●Researcherstrainedarecurrentneuralnetworkwitha5hoursequenceofcontinuousvitalsigninputs(e.g.,HR,RR)collectedfromawearablechestdevice,withdemographicsfrom888non-ICUpatientstodetectearlydeterioration.
●Predicted9xmoreclinicalalerts(ModifiedEarlyWarningScore(MEWS)>6for>30mins)8-24hoursbeforeEHR-basedMEWSalerts,with
AUROC0.89(retrospective)andAUROC0.84-0.9(prospective).
Predicted9of11hardoutcomeevents(cardiacarrests,death)upto17hoursbeforeMEWS.
●Enablesfasterrecognitionofphysiologicdeclineandthepotentialtopreventavoidabledeteriorations.
Scheid,Zanosetal.,NatureCommunications,Jul.2025
ARISE-AI.ORG
Page#
Page24
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
PredictingBiologicalAgingatPopulationScaleUsingLargeLanguageModels
ThisstudyintroducesanLLMpromptbasedframeworkthatpredictsbiologicalagefromroutinehealthrecords,enablingscalableagingassessmentacrosspopulations.Appliedto>10millionindividualsfromsixcohorts(e.g.,UKBiobank),theLLM-derivedbiologicalageoutperformedtraditionalagingclocksinpredictingmortalityandmultipleage-related
diseases.
●UsingLLMsintheLlamaandQwenfamilies,appliedpromptlearningwithoutsupervisedlearningonagingrelatedknowledge.Afterbeingfedhealth
examinationtextreports,LLMsintegrateindividualizedclinicaldatatoinferbiologicalagewithoutpredefinedbiomarkersorlabels.
●LLM-basedbiologicalageachievedaconcordance-indexof0.76for
all-causemortality.Alsooutperformedepigeneticclocks,telomerelength,frailtyindex,andconventionalMLmodels.Thedifferencebetween
LLM-predictedageandchronologicalage(“age-gap”)wasstronglyassociatedwithall-causemortality(HR1.05).
●LLM-derivedorgan-specificbiologicalagesbetterpredictedcorrespondingorgandiseasesandenabledpotentialdiscoveryof316aging-relatedproteinbiomarkers.
●Potentialforscalableandcost-effectivepersonalizedandpopulationagingassessmentwithinterpretabilityusingchainofthoughtprompts.
Li,Dietal.,NatureMedicine,Jul.2025
ARISE-AI.ORG
Page#
Page25
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
PredictingInsulinResistanceUsingWearables+RoutineLabsatScale
Researcherspairedsmartwatch-deriveddata(Fitbit/PixelWatch)withdemographicsandroutinebloodbiomarkersto
predictinsulinresistanceusingdeepneuralnetworkmodels.Thebest-performingpracticalmodel(wearables+
demographics+commonlabs)substantiallyoutperformedsingle-sourcemodelsandmaintainedsimilarperformanceinanindependentvalidationcohort.Performancewasstrongestinhighriskgroups(obesity+sedentary).
●Currentmethodsfordetectingearlyinsulinresistancerelyonsnapshotsintime(e.g.,A1c)whichcanbeinsensitiveinearlystages.
●In1,165participants,usingaHomeostaticModelAssessmentofInsulinResistance(HOMA-IR)>2.9asgroundtruth,usingonlydemographic
variablesandwearabledata,themodelachievedanAUROC0.7.AddingfastingglucoseincreasedperformancetoAUROC0.78.
●Combiningwearables+demographics+fastingglucose+lipid/metabolic
panelsachievedAUROC=0.80,76%sensitivity,84%specificity.Performancewasbestinobese+sedentaryparticipantswith93%sensitivityand95%
adjustedspecificity(minimizesmisclassificationofinsulinsensitiveasresistant).Similarperformanceinavalidationsetof72participants.
●WhentheseinsulinresistancepredictionswereintegratedintoanLLM
coachingagent,endocrinologistsconsistentlyrateditsuperiortoabaseLLMinhead-to-headcomparisonsforpersonalization,comprehensiveness,andtrustworthiness.
Metwally,Prietoetal.,ArXiv,Apr.2025
ARISE-AI.ORG
Page#
Page26
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
AFoundationModelforWearableBehavioralDatawithIndividualLevelDiagnosticPrediction
JointEmbeddingforTimeSeries(JETS)isaself-supervisedjoint-embeddingmodeltrainedon~3millionperson-daysofreal-worldwearableandbehavioraldatafrom16,522individuals.Bylearningrobustlatentrepresentationsfromnoisy,
irregulartimeseries,JETSimprovesdownstreampredictionofdiagnosesandbiomarkerscomparedwithmultiplebaselinemodels.
●Manytimeseriesmodelsrelyondense,regularlysampled,fixed
lengthinputsthatoftenisnotcongruentwithrealworlddata.
Joint-embeddingpredictivearchitecture(JEPA-style)withmasking,learnstopredictmissingsegmentsinlatentspaceinsteadof
reconstructingrawsignals.
●Trainedon63dailyorlow-resolutionmetrics(activity,sleep,HR,
VO₂max,respiration,self-reports),covering~3Mperson-daysacross16,522users.
●OutperformedMAE,PrimeNet,andtransformerbaselinesonmanydiagnoses(e.g.,AUROCME/CFS0.81,HTN0.87))andledbiomarkerpredictiondespitesparselabels.
●JETSshowsthatafoundationmodeltrainedonmassivewearabletime-seriescanlearngeneralizablehealthrepresentationsthat
outperformexistingapproachesonrealclinicalpredictiontasks.
Xie,Ballingeretal.,OpenReview,Dec.2025
ARISE-AI.ORG
Page#
Page27
Performance
/
Benchmarks
/
Methods
/
ClinicalWorkflows
/
Patient-Facing
/
AppliedAI&Demos
AgentMD:UsingLLMAgentstoRunClinicalRiskCalculatorsforRiskPredictionatScale
Clinicalcalculatorsareimportantmedicaltoolsbutremainunderutilizedduetopoordissemination,workflowburden,andfragmentedimplementation.AgentMDisanAIagentthatreadsnotes,determineswhichcalculatorsapply,extractsinputs,andutilizesclinicalcalculators,enablingaccurateandinterpretableriskprediction.
●AgentMDautomaticallyconvertedPubMedarticlesinto2,164executableclinicalcalculators,achieving>85%accuracyonexpertqualitychecksand>90%passratesonunittesting.
●Onacontrolledbenchmark(RiskQA-requiresselectingthe
correctcalculator,computing,andinterpretation),Agent
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 潍坊食品科技职业学院《化工环保与安全概论》2024-2025学年第二学期期末试卷
- 民非内部管理制度
- 浪潮集团内部规章制度
- 海阳核电站内部管理制度
- 煤矿内部转运管理制度
- 牧草企业内部管理制度
- 环境建设内部管理制度
- 疾控中心内部督导制度
- 监理公司内部工作制度
- 监理机构内部工作制度
- 壮美广西多彩生活教案
- 《建筑工程质量控制与验收(第2版)》高职全套教学课件
- 2026届河北省廊坊市安次区物理八年级第一学期期末综合测试试题含解析
- 2025至2030体声波(BAW)射频滤波器行业产业运行态势及投资规划深度研究报告
- 2026年山东传媒职业学院单招职业技能考试题库及答案1套
- 江西单招考试题库及答案
- 户外亮化知识培训课件
- 瑞幸咖啡工作流程
- 沥青路面施工课件
- 《PLC电气控制技术》课件(共九章)
- 智能小车项目课件
评论
0/150
提交评论