Related papers: RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction

RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction

URL: http://arxiv.org/abs/2503.03802v1
Date: Wed, 05 Mar 2025 18:46:51 GMT
Title: RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction
Authors: Fenglin Liu, Jinge Wu, Hongjian Zhou, Xiao Gu, Soheila Molaei, Anshul Thakur, Lei Clifton, Honghan Wu, David A. Clifton,
Abstract summary: We present the RiskAgent system to perform a broad range of medical risk predictions.<n>RiskAgent covers over 387 risk scenarios across diverse complex diseases, e.g., cardiovascular disease and cancer.<n>We have built the first benchmark MedRisk specialized for risk prediction, including 12,352 questions spanning 154 diseases, 86 symptoms, 50 specialties, and 24 organ systems.
Score: 27.520717720270415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The application of Large Language Models (LLMs) to various clinical applications has attracted growing research attention. However, real-world clinical decision-making differs significantly from the standardized, exam-style scenarios commonly used in current efforts. In this paper, we present the RiskAgent system to perform a broad range of medical risk predictions, covering over 387 risk scenarios across diverse complex diseases, e.g., cardiovascular disease and cancer. RiskAgent is designed to collaborate with hundreds of clinical decision tools, i.e., risk calculators and scoring systems that are supported by evidence-based medicine. To evaluate our method, we have built the first benchmark MedRisk specialized for risk prediction, including 12,352 questions spanning 154 diseases, 86 symptoms, 50 specialties, and 24 organ systems. The results show that our RiskAgent, with 8 billion model parameters, achieves 76.33% accuracy, outperforming the most recent commercial LLMs, o1, o3-mini, and GPT-4.5, and doubling the 38.39% accuracy of GPT-4o. On rare diseases, e.g., Idiopathic Pulmonary Fibrosis (IPF), RiskAgent outperforms o1 and GPT-4.5 by 27.27% and 45.46% accuracy, respectively. Finally, we further conduct a generalization evaluation on an external evidence-based diagnosis benchmark and show that our RiskAgent achieves the best results. These encouraging results demonstrate the great potential of our solution for diverse diagnosis domains. To improve the adaptability of our model in different scenarios, we have built and open-sourced a family of models ranging from 1 billion to 70 billion parameters. Our code, data, and models are all available at https://github.com/AI-in-Health/RiskAgent.

Related papers

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z)
Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z)
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner is a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports. Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards.
arXiv Detail & Related papers (2025-04-29T16:48:23Z)
How Well Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
RadA-BenchPlat is an evaluation platform that benchmarks the performance of large language models (LLMs) in radiology environments. The platform also defines ten categories of tools for agent-driven task solving and evaluates seven leading LLMs.
arXiv Detail & Related papers (2024-12-12T18:20:16Z)
Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare [0.2302001830524133]
Biased AI-generated medical advice and misdiagnoses can jeopardize patient safety. This study introduces new resources designed to promote ethical and precise AI in healthcare.
arXiv Detail & Related papers (2024-10-09T06:00:05Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
Diagnosis Uncertain Models For Medical Risk Prediction [80.07192791931533]
We consider a patient risk model which has access to vital signs, lab values, and prior history but does not have access to a patient's diagnosis. We show that such all-cause' risk models have good generalization across diagnoses but have a predictable failure mode. We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses.
arXiv Detail & Related papers (2023-06-29T23:36:04Z)
Generative models improve fairness of medical classifiers under distribution shifts [49.10233060774818]
We show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. We demonstrate that these learned augmentations can surpass ones by making models more robust and statistically fair in- and out-of-distribution.
arXiv Detail & Related papers (2023-04-18T18:15:38Z)
Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents. We generate an automatic tumor boundary detector for the rare disease of glioblastoma. We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z)
Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution. Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z)
Development of a dynamic type 2 diabetes risk prediction tool: a UK Biobank study [0.8620335948752806]
We developed a predictive 10-year type 2 diabetes risk score using 301 features from the UK Biobank dataset. A Cox proportional hazards model slightly overperformed a DeepSurv model trained using the same features. This tool can be used for clinical screening of individuals at risk of developing type 2 diabetes and to foster patient empowerment.
arXiv Detail & Related papers (2021-04-20T16:37:26Z)
A scalable approach for developing clinical risk prediction applications in different hospitals [2.3837093461599634]
Machine learning algorithms are now widely used in predicting acute events for clinical applications. We provide a scalable solution to extend the process of clinical risk prediction model development to multiple diseases.
arXiv Detail & Related papers (2021-01-21T21:22:32Z)
Clinical prediction system of complications among COVID-19 patients: a development and validation retrospective multicentre study [0.3569980414613667]
We used data collected from 3,352 COVID-19 patient encounters admitted to 18 facilities between April 1 and April 30, 2020 in Abu Dhabi (AD), UAE. Using data collected during the first 24 hours of admission, the machine learning-based prognostic system predicts the risk of developing any of seven complications during the hospital stay. The system achieves good accuracy across all complications and both regions.
arXiv Detail & Related papers (2020-11-28T18:16:23Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Deep Learning-based Computational Pathology Predicts Origins for Cancers of Unknown Primary [2.645435564532842]
Cancer of unknown primary (CUP) is an enigmatic group of diagnoses where the primary anatomical site of tumor origin cannot be determined. Recent work has focused on using genomics and transcriptomics for identification of tumor origins. We present a deep learning-based computational pathology algorithm that can provide a differential diagnosis for CUP.
arXiv Detail & Related papers (2020-06-24T17:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.