Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
- URL: http://arxiv.org/abs/2403.13313v1
- Date: Wed, 20 Mar 2024 05:34:03 GMT
- Title: Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
- Authors: Subhabrata Mukherjee, Paul Gamble, Markel Sanz Ausin, Neel Kant, Kriti Aggarwal, Neha Manjunath, Debajyoti Datta, Zhengliang Liu, Jiayuan Ding, Sophia Busacca, Cezanne Bianco, Swapnil Sharma, Rae Lasko, Michelle Voisard, Sanchay Harneja, Darya Filippova, Gerry Meixiong, Kevin Cha, Amir Youssefi, Meyhaa Buvanesh, Howard Weingram, Sebastian Bierman-Lytle, Harpreet Singh Mangat, Kim Parikh, Saad Godil, Alex Miller,
- Abstract summary: Polaris is the first safety-focused LLM constellation for real-time patient-AI healthcare conversations.
We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents.
We recruit over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system.
- Score: 17.074456639617996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful primary agent that focuses on driving an engaging conversation and several specialist support agents focused on healthcare tasks performed by nurses to increase safety and reduce hallucinations. We develop a sophisticated training protocol for iterative co-training of the agents that optimize for diverse objectives. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents. We align our models to speak like medical professionals, using organic healthcare conversations and simulated ones between patient actors and experienced nurses. This allows our system to express unique capabilities such as rapport building, trust building, empathy and bedside manner. Finally, we present the first comprehensive clinician evaluation of an LLM system for healthcare. We recruited over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system by posing as patients and rating the system on several measures. We demonstrate Polaris performs on par with human nurses on aggregate across dimensions such as medical safety, clinical readiness, conversational quality, and bedside manner. Additionally, we conduct a challenging task-based evaluation of the individual specialist support agents, where we demonstrate our LLM agents significantly outperform a much larger general-purpose LLM (GPT-4) as well as from its own medium-size class (LLaMA-2 70B).
Related papers
- Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health [1.8772687384996551]
Large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI.
We showcase the power of LLMs in handling unstructured conversational data through four case studies.
arXiv Detail & Related papers (2024-06-19T16:02:04Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - From Beginner to Expert: Modeling Medical Knowledge into General LLMs [22.475129648458136]
Large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation.
These models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner.
In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B)
arXiv Detail & Related papers (2023-12-02T05:54:06Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z) - Large Language Models Illuminate a Progressive Pathway to Artificial
Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning.
This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z) - Talk2Care: Facilitating Asynchronous Patient-Provider Communication with
Large-Language-Model [29.982507402325396]
We built an LLM-powered communication system, Talk2Care, for older adults and healthcare providers.
For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered VA interface for effective information collection.
The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time.
arXiv Detail & Related papers (2023-09-17T19:46:03Z) - An Automatic Evaluation Framework for Multi-turn Medical Consultations
Capabilities of Large Language Models [22.409334091186995]
Large language models (LLMs) often suffer from hallucinations, leading to overly confident but incorrect judgments.
This paper introduces an automated evaluation framework that assesses the practical capabilities of LLMs as virtual doctors during multi-turn consultations.
arXiv Detail & Related papers (2023-09-05T09:24:48Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.