Related papers: Exploring Large Language Models for Specialist-level Oncology Care

Exploring Large Language Models for Specialist-level Oncology Care

URL: http://arxiv.org/abs/2411.03395v1
Date: Tue, 05 Nov 2024 18:30:13 GMT
Title: Exploring Large Language Models for Specialist-level Oncology Care
Authors: Anil Palepu, Vikram Dhillon, Polly Niravath, Wei-Hung Weng, Preethi Prasad, Khaled Saab, Ryutaro Tanno, Yong Cheng, Hanh Mai, Ethan Burns, Zainub Ajmal, Kavita Kulkarni, Philip Mansfield, Dale Webster, Joelle Barral, Juraj Gottweis, Mike Schaekermann, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Tao Tu,
Abstract summary: We probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care. We curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases. We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy.
Score: 17.34069859182619
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care without specific fine-tuning to this challenging domain. To perform this evaluation, we curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases and mirroring the key information available to a multidisciplinary tumor board for decision-making (openly released with this work). We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy. To improve performance, we enhanced AMIE with the inference-time ability to perform web search retrieval to gather relevant and up-to-date clinical knowledge and refine its responses with a multi-stage self-critique pipeline. We compare response quality of AMIE with internal medicine trainees, oncology fellows, and general oncology attendings under both automated and specialist clinician evaluations. In our evaluations, AMIE outperformed trainees and fellows demonstrating the potential of the system in this challenging and important domain. We further demonstrate through qualitative examples, how systems such as AMIE might facilitate conversational interactions to assist clinicians in their decision making. However, AMIE's performance was overall inferior to attending oncologists suggesting that further research is needed prior to consideration of prospective uses.

Related papers

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment [0.0]
AI-driven systems can analyze vast datasets, assisting clinicians in identifying diseases, recommending treatments, and predicting patient outcomes. This study evaluates the performance of a range of contemporary LLMs, including both open-source and closed-source models, on the 2024 Portuguese National Exam for medical specialty access.
arXiv Detail & Related papers (2025-04-14T16:53:59Z)
TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews [54.35097932763878]
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data. Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews. We demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness.
arXiv Detail & Related papers (2025-03-26T15:58:16Z)
Towards Conversational AI for Disease Management [29.189384095061722]
Articulate Medical Intelligence Explorer (AMIE) is an agentic system optimised for clinical management and dialogue. AMIE is non-inferior to PCPs in management reasoning as assessed by specialist physicians. AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
arXiv Detail & Related papers (2025-03-08T05:48:58Z)
Expertise Is What We Want [0.0]
We share an application architecture, the Large Language Expert (LLE), that combines the flexibility and power of Large Language Models (LLMs) with the interpretability, explainability, and reliability of Expert Systems. To highlight the power of the Large Language Expert (LLE) system, we built an LLE to assist with the workup of patients newly diagnosed with cancer.
arXiv Detail & Related papers (2025-02-27T18:05:15Z)
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow. We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z)
Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios [46.729092855387165]
We study the choice of the backbone LLM for medical AI agents, which is the foundation for the agent's overall reasoning and action generation. Our findings demonstrate o1's ability to enhance diagnostic accuracy and consistency, paving the way for smarter, more responsive AI tools.
arXiv Detail & Related papers (2024-11-16T18:19:53Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare. This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z)
Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy [6.952909762512736]
We examine the effects of prompt engineering to guide Large Language Models (LLMs) in delivering parts of a Problem-Solving Therapy session via text. We demonstrate that the models' capability to deliver protocolized therapy can be improved with the proper use of prompt engineering methods.
arXiv Detail & Related papers (2024-08-27T17:25:16Z)
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals. GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z)
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions. VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis [30.943705201552643]
We propose a framework to model the diagnosis process in the real world by adaptively fusing probability distributions of agents over potential diseases. Our approach requires significantly less parameter updating and training time, enhancing efficiency and practical utility.
arXiv Detail & Related papers (2024-01-29T12:25:30Z)
MedLM: Exploring Language Models for Medical Question Answering Systems [2.84801080855027]
Large Language Models (LLMs) with their advanced generative capabilities have shown promise in various NLP tasks. This study aims to compare the performance of general and medical-specific distilled LMs for medical Q&A. The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.
arXiv Detail & Related papers (2024-01-21T03:37:47Z)
Towards Conversational Diagnostic AI [32.84876349808714]
We introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors.
arXiv Detail & Related papers (2024-01-11T04:25:06Z)
Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models [0.23463422965432823]
BooksMed is a novel framework based on a Large Language Model (LLM) It emulates human cognitive processes to deliver evidence-based and reliable responses. We present ExpertMedQA, a benchmark comprised of open-ended, expert-level clinical questions.
arXiv Detail & Related papers (2023-10-17T13:39:26Z)
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.