Almanac: Retrieval-Augmented Language Models for Clinical Medicine
- URL: http://arxiv.org/abs/2303.01229v2
- Date: Wed, 31 May 2023 21:17:13 GMT
- Title: Almanac: Retrieval-Augmented Language Models for Clinical Medicine
- Authors: Cyril Zakka, Akash Chaurasia, Rohan Shad, Alex R. Dalal, Jennifer L.
Kim, Michael Moor, Kevin Alexander, Euan Ashley, Jack Boyd, Kathleen Boyd,
Karen Hirsch, Curt Langlotz, Joanna Nelson, and William Hiesinger
- Abstract summary: We develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations.
Performance on a novel dataset of clinical scenarios evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality.
- Score: 1.5505279143287174
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-language models have recently demonstrated impressive zero-shot
capabilities in a variety of natural language tasks such as summarization,
dialogue generation, and question-answering. Despite many promising
applications in clinical medicine, adoption of these models in real-world
settings has been largely limited by their tendency to generate incorrect and
sometimes even toxic statements. In this study, we develop Almanac, a large
language model framework augmented with retrieval capabilities for medical
guideline and treatment recommendations. Performance on a novel dataset of
clinical scenarios (n = 130) evaluated by a panel of 5 board-certified and
resident physicians demonstrates significant increases in factuality (mean of
18% at p-value < 0.05) across all specialties, with improvements in
completeness and safety. Our results demonstrate the potential for large
language models to be effective tools in the clinical decision-making process,
while also emphasizing the importance of careful testing and deployment to
mitigate their shortcomings.
Related papers
- Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time [5.635300481123079]
Based on official estimates, 50 million people worldwide are affected by dementia, and this number increases by 10 million new patients every year.
To this end, Artificial Intelligence and computational linguistics can be exploited for natural language analysis, personalized assessment, monitoring, and treatment.
We contribute with an affordable, flexible, non-invasive, personalized diagnostic system to this work.
arXiv Detail & Related papers (2024-09-05T09:27:05Z) - SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research [45.2233252981348]
Large Language Models have shown promising results in their ability to encode general medical knowledge.
We test the ability of state-of-the-art LLMs to leverage their internal knowledge and reasoning for epilepsy diagnosis.
arXiv Detail & Related papers (2024-07-03T11:02:12Z) - Evaluating Large Language Models for Radiology Natural Language
Processing [68.98847776913381]
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP)
This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports.
arXiv Detail & Related papers (2023-07-25T17:57:18Z) - ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation [5.690250818139763]
Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks.
Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience.
We present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios.
arXiv Detail & Related papers (2023-06-16T16:56:32Z) - Language Models are Few-shot Learners for Prognostic Prediction [0.4254099382808599]
We explore the use of transformers and language models in prognostic prediction for immunotherapy using real-world patients' clinical data and molecular profiles.
The study benchmarks the efficacy of baselines and language models on prognostic prediction across multiple cancer types and investigates the impact of different pretrained language models under few-shot regimes.
arXiv Detail & Related papers (2023-02-24T15:35:36Z) - Do We Still Need Clinical Language Models? [15.023633270864675]
We show that relatively small specialized clinical models substantially outperform all in-context learning approaches.
We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
arXiv Detail & Related papers (2023-02-16T05:08:34Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.