Timely Clinical Diagnosis through Active Test Selection
- URL: http://arxiv.org/abs/2510.18988v3
- Date: Fri, 24 Oct 2025 12:05:14 GMT
- Title: Timely Clinical Diagnosis through Active Test Selection
- Authors: Silas Ruhrberg Estévez, Nicolás Astorga, Mihaela van der Schaar,
- Abstract summary: We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
- Score: 49.091903570068155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is growing interest in using machine learning (ML) to support clinical diagnosis, but most approaches rely on static, fully observed datasets and fail to reflect the sequential, resource-aware reasoning clinicians use in practice. Diagnosis remains complex and error prone, especially in high-pressure or resource-limited settings, underscoring the need for frameworks that help clinicians make timely and cost-effective decisions. We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design), a diagnostic framework that integrates Bayesian Experimental Design (BED) with large language models (LLMs) to better emulate real-world diagnostic reasoning. At each step, ACTMED selects the test expected to yield the greatest reduction in diagnostic uncertainty for a given patient. LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data. Clinicians can remain in the loop; reviewing test suggestions, interpreting intermediate outputs, and applying clinical judgment throughout. We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use. This represents a step toward transparent, adaptive, and clinician-aligned diagnostic systems that generalize across settings with reduced reliance on domain-specific data.
Related papers
- ClinDEF: A Dynamic Evaluation Framework for Large Language Models in Clinical Reasoning [58.01333341218153]
We propose ClinDEF, a dynamic framework for assessing clinical reasoning in LLMs through simulated diagnostic dialogues.<n>Our method generates patient cases and facilitates multi-turn interactions between an LLM-based doctor and an automated patient agent.<n>Experiments show that ClinDEF effectively exposes critical clinical reasoning gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-12-29T12:58:58Z) - Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning.<n>Our method acquires diagnostic strategies through interactive exploration and outcome-based feedback.<n>DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o.
arXiv Detail & Related papers (2025-10-28T17:19:47Z) - Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z) - DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction [6.253071540087993]
We propose DiaLLM, the first medical LLM that integrates heterogeneous EHR data into clinically grounded dialogues.<n>To construct clinically grounded dialogues from EHR, we design a Clinical Test Reference (CTR) strategy that maps each clinical code to its corresponding description and classifies test results as "normal" or "abnormal"<n>Extensive experimental results demonstrate that DiaLLM outperforms baselines in clinical test recommendation and diagnosis prediction.
arXiv Detail & Related papers (2025-06-24T23:47:21Z) - Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning [38.49879425944787]
We propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM.<n>We train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making.<n>We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases.
arXiv Detail & Related papers (2025-06-16T13:32:01Z) - TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching [0.0]
We present TrialMatchAI, an AI-powered recommendation system that automates patient-to-trial matching.<n>Built on fine-tuned, open-source large language models, TrialMatchAI ensures transparency and maintains a lightweight deployment footprint.<n>In real-world validation, 92 percent of oncology patients had at least one relevant trial retrieved within the top 20 recommendations.
arXiv Detail & Related papers (2025-05-13T12:39:06Z) - Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption.<n>Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z) - MSDiagnosis: A Benchmark for Evaluating Large Language Models in Multi-Step Clinical Diagnosis [8.641421154025211]
We propose a Chinese clinical diagnostic benchmark, called MSDiagnosis.<n>This benchmark consists of 2,225 cases from 12 departments, covering tasks such as primary diagnosis, differential diagnosis, and final diagnosis.
arXiv Detail & Related papers (2024-08-19T14:31:57Z) - Towards the Identifiability and Explainability for Personalized Learner
Modeling: An Inductive Paradigm [36.60917255464867]
We propose an identifiable cognitive diagnosis framework (ID-CDF) based on a novel response-proficiency-response paradigm inspired by encoder-decoder models.
We show that ID-CDF can effectively address the problems without loss of diagnosis preciseness.
arXiv Detail & Related papers (2023-09-01T07:18:02Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.