Expertise Is What We Want
- URL: http://arxiv.org/abs/2502.20335v1
- Date: Thu, 27 Feb 2025 18:05:15 GMT
- Title: Expertise Is What We Want
- Authors: Alan Ashworth, Munir Al-Dajani, Keegan Duchicela, Kiril Kafadarov, Allison Kurian, Othman Laraki, Amina Lazrak, Divneet Mandair, Wendy McKennon, Rebecca Miksad, Jayodita Sanghvi, Travis Zack,
- Abstract summary: We share an application architecture, the Large Language Expert (LLE), that combines the flexibility and power of Large Language Models (LLMs) with the interpretability, explainability, and reliability of Expert Systems.<n>To highlight the power of the Large Language Expert (LLE) system, we built an LLE to assist with the workup of patients newly diagnosed with cancer.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical decision-making depends on expert reasoning, which is guided by standardized, evidence-based guidelines. However, translating these guidelines into automated clinical decision support systems risks inaccuracy and importantly, loss of nuance. We share an application architecture, the Large Language Expert (LLE), that combines the flexibility and power of Large Language Models (LLMs) with the interpretability, explainability, and reliability of Expert Systems. LLMs help address key challenges of Expert Systems, such as integrating and codifying knowledge, and data normalization. Conversely, an Expert System-like approach helps overcome challenges with LLMs, including hallucinations, atomic and inexpensive updates, and testability. To highlight the power of the Large Language Expert (LLE) system, we built an LLE to assist with the workup of patients newly diagnosed with cancer. Timely initiation of cancer treatment is critical for optimal patient outcomes. However, increasing complexity in diagnostic recommendations has made it difficult for primary care physicians to ensure their patients have completed the necessary workup before their first visit with an oncologist. As with many real-world clinical tasks, these workups require the analysis of unstructured health records and the application of nuanced clinical decision logic. In this study, we describe the design & evaluation of an LLE system built to rapidly identify and suggest the correct diagnostic workup. The system demonstrated a high degree of clinical-level accuracy (>95%) and effectively addressed gaps identified in real-world data from breast and colon cancer patients at a large academic center.
Related papers
- Performance of Large Language Models in Supporting Medical Diagnosis and Treatment [0.0]
AI-driven systems can analyze vast datasets, assisting clinicians in identifying diseases, recommending treatments, and predicting patient outcomes.
This study evaluates the performance of a range of contemporary LLMs, including both open-source and closed-source models, on the 2024 Portuguese National Exam for medical specialty access.
arXiv Detail & Related papers (2025-04-14T16:53:59Z) - Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.
We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.
Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Integrating Generative Artificial Intelligence in ADRD: A Framework for Streamlining Diagnosis and Care in Neurodegenerative Diseases [0.0]
We propose that large language models (LLMs) offer more immediately practical applications by enhancing clinicians' capabilities.<n>We present a framework for responsible AI integration that leverages LLMs' ability to communicate effectively with both patients and providers.<n>This approach prioritizes standardized, high-quality data collection to enable a system that learns from every patient encounter.
arXiv Detail & Related papers (2025-02-06T19:09:11Z) - Exploring Large Language Models for Specialist-level Oncology Care [17.34069859182619]
We probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care.
We curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases.
We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy.
arXiv Detail & Related papers (2024-11-05T18:30:13Z) - Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare.
This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z) - RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules.
We develop a medical dialogue dataset comprising rule-based communications between patients and physicians.
Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - Conversational Disease Diagnosis via External Planner-Controlled Large Language Models [18.93345199841588]
This study presents a LLM-based diagnostic system that enhances planning capabilities by emulating doctors.
By utilizing real patient electronic medical record data, we constructed simulated dialogues between virtual patients and doctors.
arXiv Detail & Related papers (2024-04-04T06:16:35Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation
for Automatic Diagnosis [30.943705201552643]
We propose a framework to model the diagnosis process in the real world by adaptively fusing probability distributions of agents over potential diseases.
Our approach requires significantly less parameter updating and training time, enhancing efficiency and practical utility.
arXiv Detail & Related papers (2024-01-29T12:25:30Z) - Towards Conversational Diagnostic AI [32.84876349808714]
We introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions.
AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors.
arXiv Detail & Related papers (2024-01-11T04:25:06Z) - Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models [29.05425041393475]
Generative Large Language Models (LLMs) hold significant promise in healthcare.
This study assessed the potential of LLMs to function as autonomous agents in a simulated tertiary care medical center.
arXiv Detail & Related papers (2024-01-05T15:09:57Z) - RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19
Assessment in Primary Care [45.43645878061283]
We present a framework that performs knowledge graph construction from raw GP medical notes written during or after patient consultations.
Our knowledge graphs include information about existing patient symptoms, their duration, and their severity.
We apply our framework to consultation notes of COVID-19 patients in the UK.
arXiv Detail & Related papers (2023-06-17T23:35:51Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.