Related papers: An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

URL: http://arxiv.org/abs/2407.21051v1
Date: Tue, 23 Jul 2024 05:00:18 GMT
Title: An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice
Authors: Roma Shusterman, Allison C. Waters, Shannon O`Neill, Phan Luu, Don M. Tucker,
Abstract summary: Large Language Models (LLMs) are non-deterministic, may provide incorrect or harmful responses, and cannot be regulated to assure quality control. Our proposed framework refines LLM responses by restricting their primary knowledge base to domain-specific datasets containing validated medical information. We conducted a validation study where expert cognitive behaviour therapy for insomnia therapists evaluated responses from the LLM in a blind format.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continuing advances in Large Language Models (LLMs) in artificial intelligence offer important capacities in intuitively accessing and using medical knowledge in many contexts, including education and training as well as assessment and treatment. Most of the initial literature on LLMs in medicine has emphasized that LLMs are unsuitable for medical use because they are non-deterministic, may provide incorrect or harmful responses, and cannot be regulated to assure quality control. If these issues could be corrected, optimizing LLM technology could benefit patients and physicians by providing affordable, point-of-care medical knowledge. Our proposed framework refines LLM responses by restricting their primary knowledge base to domain-specific datasets containing validated medical information. Additionally, we introduce an actor-critic LLM prompting protocol based on active inference principles of human cognition, where a Therapist agent initially responds to patient queries, and a Supervisor agent evaluates and adjusts responses to ensure accuracy and reliability. We conducted a validation study where expert cognitive behaviour therapy for insomnia (CBT-I) therapists evaluated responses from the LLM in a blind format. Experienced human CBT-I therapists assessed responses to 100 patient queries, comparing LLM-generated responses with appropriate and inappropriate responses crafted by experienced CBT-I therapists. Results showed that LLM responses received high ratings from the CBT-I therapists, often exceeding those of therapist-generated appropriate responses. This structured approach aims to integrate advanced LLM technology into medical applications, meeting regulatory requirements for establishing the safe and effective use of special purpose validated LLMs in medicine.

Related papers

Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns. Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance. We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z)
Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks [46.95099594570405]
Medication recommendation systems have garnered attention within healthcare for their potential to deliver personalized and efficacious drug combinations based on patient's clinical data. Existing methodologies encounter challenges in adapting to diverse Electronic Health Records (EHR) systems. We propose Language-Assisted Medication Recommendation (LAMO), which employs a parameter-efficient fine-tuning approach to tailor open-source LLMs for optimal performance in medication recommendation scenarios.
arXiv Detail & Related papers (2025-03-05T17:28:16Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities. We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
Med-R$^2$: Crafting Trustworthy LLM Physicians via Retrieval and Reasoning of Evidence-Based Medicine [40.651632523697536]
Large Language Models (LLMs) have exhibited remarkable capabilities in clinical scenarios.<n>We introduce Med-R2, a novel framework that adheres to the Evidence-Based Medicine (EBM) process.<n>Our experiments indicate that Med-R2 achieves a 14.74% improvement over vanilla RAG methods and even a 3.32% enhancement compared to fine-tuning strategies.
arXiv Detail & Related papers (2025-01-21T04:40:43Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare. This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z)
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance. We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions. Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z)
PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models [10.258261180305439]
Large language models (LLMs) offer a new approach to assessing complex communication metrics. LLMs offer the potential to advance the field through integration into passive sensing and just-in-time intervention systems. This study explores LLMs as evaluators of palliative care communication quality, leveraging their linguistic, in-context learning, and reasoning capabilities.
arXiv Detail & Related papers (2024-09-23T16:39:12Z)
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians. Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z)
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations [5.265452667976959]
This survey systematically summarizes how to train medical LLMs based on open-source general LLMs. It covers (a) how to acquire training corpus and construct customized medical training sets, (b) how to choose an appropriate training paradigm, and (d) existing challenges and promising research directions.
arXiv Detail & Related papers (2024-06-14T02:42:20Z)
Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm [15.627870862369784]
Large language models (LLMs) are gaining increasing interests to improve clinical efficiency for medical diagnosis. We propose an automatic evaluation paradigm tailored to assess the LLMs' capabilities in delivering clinical services.
arXiv Detail & Related papers (2024-03-25T06:17:54Z)
LLM on FHIR -- Demystifying Health Records [0.32985979395737786]
This study developed an app allowing users to interact with their health records using large language models (LLMs) The app effectively translated medical data into patient-friendly language and was able to adapt its responses to different patient profiles.
arXiv Detail & Related papers (2024-01-25T17:45:34Z)
A Computational Framework for Behavioral Assessment of LLM Therapists [7.665475687919995]
Large language models (LLMs) like ChatGPT have increased interest in their use as therapists to address mental health challenges. We propose BOLT, a proof-of-concept computational framework to systematically assess the conversational behavior of LLM therapists.
arXiv Detail & Related papers (2024-01-01T17:32:28Z)
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain. ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF. We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z)
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models [22.409334091186995]
Large language models (LLMs) often suffer from hallucinations, leading to overly confident but incorrect judgments. This paper introduces an automated evaluation framework that assesses the practical capabilities of LLMs as virtual doctors during multi-turn consultations.
arXiv Detail & Related papers (2023-09-05T09:24:48Z)
Medical Misinformation in AI-Assisted Self-Diagnosis: Development of a Method (EvalPrompt) for Analyzing Large Language Models [4.8775268199830935]
This study aims to assess the effectiveness of large language models (LLMs) as a self-diagnostic tool and their role in spreading healthcare misinformation. We use open-ended questions to mimic real-world self-diagnosis use cases, and perform sentence dropout to mimic realistic self-diagnosis with missing information. The results highlight the modest capabilities of LLMs, as their responses are often unclear and inaccurate.
arXiv Detail & Related papers (2023-07-10T21:28:26Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.