A Comparative Study of Pretrained Language Models for Long Clinical Text
- URL: http://arxiv.org/abs/2301.11847v1
- Date: Fri, 27 Jan 2023 16:50:29 GMT
- Title: A Comparative Study of Pretrained Language Models for Long Clinical Text
- Authors: Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang and Yuan Luo
- Abstract summary: We introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus.
We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks.
- Score: 4.196346055173027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective: Clinical knowledge enriched transformer models (e.g.,
ClinicalBERT) have state-of-the-art results on clinical NLP (natural language
processing) tasks. One of the core limitations of these transformer models is
the substantial memory consumption due to their full self-attention mechanism,
which leads to the performance degradation in long clinical texts. To overcome
this, we propose to leverage long-sequence transformer models (e.g., Longformer
and BigBird), which extend the maximum input sequence length from 512 to 4096,
to enhance the ability to model long-term dependencies in long clinical texts.
Materials and Methods: Inspired by the success of long sequence transformer
models and the fact that clinical notes are mostly long, we introduce two
domain enriched language models, Clinical-Longformer and Clinical-BigBird,
which are pre-trained on a large-scale clinical corpus. We evaluate both
language models using 10 baseline tasks including named entity recognition,
question answering, natural language inference, and document classification
tasks.
Results: The results demonstrate that Clinical-Longformer and
Clinical-BigBird consistently and significantly outperform ClinicalBERT and
other short-sequence transformers in all 10 downstream tasks and achieve new
state-of-the-art results.
Discussion: Our pre-trained language models provide the bedrock for clinical
NLP using long texts. We have made our source code available at
https://github.com/luoyuanlab/Clinical-Longformer, and the pre-trained models
available for public download at:
https://huggingface.co/yikuan8/Clinical-Longformer.
Conclusion: This study demonstrates that clinical knowledge enriched
long-sequence transformers are able to learn long-term dependencies in long
clinical text. Our methods can also inspire the development of other
domain-enriched long-sequence transformers.
Related papers
- Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models [58.6172667880028]
We propose a new method called forgetting curve to measure the memorization capability of long-context models.
We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings.
Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models.
arXiv Detail & Related papers (2024-10-07T03:38:27Z) - Adaptation of Biomedical and Clinical Pretrained Models to French Long
Documents: A Comparative Study [4.042419725040222]
Pretrained language models based on BERT have been introduced for the French biomedical domain.
These models are constrained by a limited input sequence length of 512 tokens, which poses challenges when applied to clinical notes.
We present a comparative study of three adaptation strategies for long-sequence models, leveraging the Longformer architecture.
arXiv Detail & Related papers (2024-02-26T16:05:33Z) - Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section [70.37720062263176]
We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
arXiv Detail & Related papers (2023-07-13T20:04:05Z) - Do We Still Need Clinical Language Models? [15.023633270864675]
We show that relatively small specialized clinical models substantially outperform all in-context learning approaches.
We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
arXiv Detail & Related papers (2023-02-16T05:08:34Z) - Lightweight Transformers for Clinical Natural Language Processing [9.532776962985828]
This study focuses on development of compact language models for processing clinical texts.
We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning.
Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
arXiv Detail & Related papers (2023-02-09T16:07:31Z) - LongFNT: Long-form Speech Recognition with Factorized Neural Transducer [64.75547712366784]
We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor.
The effectiveness of our LongFNT approach is validated on LibriSpeech and GigaSpeech corpora with 19% and 12% relative word error rate(WER) reduction, respectively.
arXiv Detail & Related papers (2022-11-17T08:48:27Z) - How Long Is Enough? Exploring the Optimal Intervals of Long-Range
Clinical Note Language Modeling [37.247872987053654]
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains.
This work explores long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context.
We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record system.
arXiv Detail & Related papers (2022-10-25T09:21:28Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Clinical-Longformer and Clinical-BigBird: Transformers for long clinical
sequences [4.196346055173027]
Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks.
One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism.
We introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora.
arXiv Detail & Related papers (2022-01-27T22:51:58Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.