Related papers: Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation

URL: http://arxiv.org/abs/2402.13408v1
Date: Tue, 20 Feb 2024 22:26:35 GMT
Title: Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation
Authors: Zhiyao Ren, Yibing Zhan, Baosheng Yu, Liang Ding, Dacheng Tao
Abstract summary: We introduce the construction of a Healthcare Copilot designed for medical consultation. The proposed Healthcare Copilot comprises three main components: 1) the Dialogue component, responsible for effective and safe patient interactions; 2) the Memory component, storing both current conversation data and historical patient information; and 3) the Processing component, summarizing the entire dialogue and generating reports. To evaluate the proposed Healthcare Copilot, we implement an auto-evaluation scheme using ChatGPT for two roles: as a virtual patient engaging in dialogue with the copilot, and as an evaluator to assess the quality of the dialogue.
Score: 96.22329536480976
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The copilot framework, which aims to enhance and tailor large language models (LLMs) for specific complex tasks without requiring fine-tuning, is gaining increasing attention from the community. In this paper, we introduce the construction of a Healthcare Copilot designed for medical consultation. The proposed Healthcare Copilot comprises three main components: 1) the Dialogue component, responsible for effective and safe patient interactions; 2) the Memory component, storing both current conversation data and historical patient information; and 3) the Processing component, summarizing the entire dialogue and generating reports. To evaluate the proposed Healthcare Copilot, we implement an auto-evaluation scheme using ChatGPT for two roles: as a virtual patient engaging in dialogue with the copilot, and as an evaluator to assess the quality of the dialogue. Extensive results demonstrate that the proposed Healthcare Copilot significantly enhances the capabilities of general LLMs for medical consultations in terms of inquiry capability, conversational fluency, response accuracy, and safety. Furthermore, we conduct ablation studies to highlight the contribution of each individual module in the Healthcare Copilot. Code will be made publicly available on GitHub.

Related papers

Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings [51.73411055162861]
We introduce a safety evaluation protocol tailored to the medical domain in both patient user and clinician user perspectives.<n>This is the first work to define safety evaluation criteria for medical LLMs through targeted red-teaming taking three different points of view.
arXiv Detail & Related papers (2025-07-09T19:38:58Z)
C-PATH: Conversational Patient Assistance and Triage in Healthcare System [3.8751966246546248]
C-PATH (Conversational Patient Assistance and Triage in Healthcare) is a novel conversational AI system powered by large language models (LLMs)<n>C-PATH is fine-tuned on medical knowledge, dialogue data, and clinical summaries using a multi-stage pipeline built on the LLaMA3 architecture.
arXiv Detail & Related papers (2025-06-07T09:48:47Z)
Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation [0.0]
Large language models (LLMs) have shown impressive capabilities in natural language processing tasks, including dialogue generation. This research aims to conduct a novel comparative analysis of two prominent techniques, fine-tuning with LoRA and the Retrieval-Augmented Generation framework.
arXiv Detail & Related papers (2025-02-04T11:50:40Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
Design and evaluation of AI copilots -- case studies of retail copilot templates [2.7274834772504954]
Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively.
arXiv Detail & Related papers (2024-06-17T17:31:33Z)
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset [26.504409173684653]
We introduce "ChatCoach", a human-AI cooperative framework designed to assist medical learners in practicing their communication skills during patient consultations. ChatCoachdifferentiates itself from conventional dialogue systems by offering a simulated environment where medical learners can practice dialogues with a patient agent, while a coach agent provides immediate, structured feedback. We have developed a dataset specifically for evaluating Large Language Models (LLMs) within the ChatCoach framework on communicative medical coaching tasks.
arXiv Detail & Related papers (2024-02-08T10:32:06Z)
PlugMed: Improving Specificity in Patient-Centered Medical Dialogue Generation using In-Context Learning [20.437165038293426]
The patient-centered medical dialogue systems strive to offer diagnostic interpretation services to users who are less knowledgeable about medical knowledge. It is difficult for the large language models (LLMs) to guarantee the specificity of responses in spite of its promising performance. Inspired by in-context learning, we propose PlugMed, a Plug-and-Play Medical Dialogue System.
arXiv Detail & Related papers (2023-05-19T08:18:24Z)
Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models [6.252236971703546]
An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue. This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks.
arXiv Detail & Related papers (2023-05-10T08:48:53Z)
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding. LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z)
FCC: Fusing Conversation History and Candidate Provenance for Contextual Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels. We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z)
A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets [70.32630628211803]
We propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. A new large medical dialogue dataset with multi-level fine-grained annotations is introduced. We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies.
arXiv Detail & Related papers (2022-04-19T16:43:21Z)
MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG. We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation. Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.