ClinicalMamba: A Generative Clinical Language Model on Longitudinal
Clinical Notes
- URL: http://arxiv.org/abs/2403.05795v1
- Date: Sat, 9 Mar 2024 04:58:25 GMT
- Title: ClinicalMamba: A Generative Clinical Language Model on Longitudinal
Clinical Notes
- Authors: Zhichao Yang, Avijit Mitra, Sunjae Kwon, Hong Yu
- Abstract summary: We introduce ClinicalMamba, a specialized version of the Mamba language model, pretrained on a vast corpus of longitudinal clinical notes.
With 130 million and 2.8 billion parameters, ClinicalMamba demonstrates a superior performance in modeling clinical language across extended text lengths.
- Score: 6.921652448124103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advancement of natural language processing (NLP) systems in healthcare
hinges on language model ability to interpret the intricate information
contained within clinical notes. This process often requires integrating
information from various time points in a patient's medical history. However,
most earlier clinical language models were pretrained with a context length
limited to roughly one clinical document. In this study, We introduce
ClinicalMamba, a specialized version of the Mamba language model, pretrained on
a vast corpus of longitudinal clinical notes to address the unique linguistic
characteristics and information processing needs of the medical domain.
ClinicalMamba, with 130 million and 2.8 billion parameters, demonstrates a
superior performance in modeling clinical language across extended text lengths
compared to Mamba and clinical Llama. With few-shot learning, ClinicalMamba
achieves notable benchmarks in speed and accuracy, outperforming existing
clinical language models and general domain large models like GPT-4 in
longitudinal clinical notes information extraction tasks.
Related papers
- CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios [50.032101237019205]
CliMedBench is a comprehensive benchmark with 14 expert-guided core clinical scenarios.
The reliability of this benchmark has been confirmed in several ways.
arXiv Detail & Related papers (2024-10-04T15:15:36Z) - Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding [5.279406017862076]
The challenge of summarising a hospital course remains an open area for further research and development.
We adapted three pre trained LLMs, Llama 3, BioMistral, Mistral Instruct v0.1 for the hospital course summarisation task.
The fine tuned models were evaluated using BERTScore and ROUGE metrics to assess the effectiveness of clinical domain fine tuning.
arXiv Detail & Related papers (2024-09-23T00:35:23Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - Dynamic Q&A of Clinical Documents with Large Language Models [3.021316686584699]
This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes.
Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands.
arXiv Detail & Related papers (2024-01-19T14:50:22Z) - Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section [70.37720062263176]
We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
arXiv Detail & Related papers (2023-07-13T20:04:05Z) - ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation [5.690250818139763]
Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks.
Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience.
We present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios.
arXiv Detail & Related papers (2023-06-16T16:56:32Z) - Do We Still Need Clinical Language Models? [15.023633270864675]
We show that relatively small specialized clinical models substantially outperform all in-context learning approaches.
We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
arXiv Detail & Related papers (2023-02-16T05:08:34Z) - Developing a general-purpose clinical language inference model from a
large corpus of clinical notes [0.30586855806896046]
We trained a Bidomain Decoder from Transformers (BERT) model using a diverse, deidentified corpus of 75 million deidentified clinical notes authored at the University of California, San Francisco (UCSF)
Our model performs at par with the best publicly available biomedical language models of comparable sizes on the public benchmark tasks, and is significantly better than these models in a within-system evaluation on the two tasks using UCSF data.
arXiv Detail & Related papers (2022-10-12T20:08:45Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.