Clinical-Longformer and Clinical-BigBird: Transformers for long clinical
sequences
- URL: http://arxiv.org/abs/2201.11838v1
- Date: Thu, 27 Jan 2022 22:51:58 GMT
- Title: Clinical-Longformer and Clinical-BigBird: Transformers for long clinical
sequences
- Authors: Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang and Yuan Luo
- Abstract summary: Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks.
One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism.
We introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora.
- Score: 4.196346055173027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers-based models, such as BERT, have dramatically improved the
performance for various natural language processing tasks. The clinical
knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art
results when performed on clinical named entity recognition and natural
language inference tasks. One of the core limitations of these transformers is
the substantial memory consumption due to their full self-attention mechanism.
To overcome this, long sequence transformer models, e.g. Longformer and
BigBird, were proposed with the idea of sparse attention mechanism to reduce
the memory usage from quadratic to the sequence length to a linear scale. These
models extended the maximum input sequence length from 512 to 4096, which
enhanced the ability of modeling long-term dependency and consequently achieved
optimal results in a variety of tasks. Inspired by the success of these long
sequence transformer models, we introduce two domain enriched language models,
namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from
large-scale clinical corpora. We evaluate both pre-trained models using 10
baseline tasks including named entity recognition, question answering, and
document classification tasks. The results demonstrate that Clinical-Longformer
and Clinical-BigBird consistently and significantly outperform ClinicalBERT as
well as other short-sequence transformers in all downstream tasks. We have made
the pre-trained models available for public download at:
[https://huggingface.co/yikuan8/Clinical-Longformer].
Related papers
- Adaptation of Biomedical and Clinical Pretrained Models to French Long
Documents: A Comparative Study [4.042419725040222]
Pretrained language models based on BERT have been introduced for the French biomedical domain.
These models are constrained by a limited input sequence length of 512 tokens, which poses challenges when applied to clinical notes.
We present a comparative study of three adaptation strategies for long-sequence models, leveraging the Longformer architecture.
arXiv Detail & Related papers (2024-02-26T16:05:33Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - Lightweight Transformers for Clinical Natural Language Processing [9.532776962985828]
This study focuses on development of compact language models for processing clinical texts.
We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning.
Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
arXiv Detail & Related papers (2023-02-09T16:07:31Z) - A Comparative Study of Pretrained Language Models for Long Clinical Text [4.196346055173027]
We introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus.
We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks.
arXiv Detail & Related papers (2023-01-27T16:50:29Z) - How Long Is Enough? Exploring the Optimal Intervals of Long-Range
Clinical Note Language Modeling [37.247872987053654]
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains.
This work explores long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context.
We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record system.
arXiv Detail & Related papers (2022-10-25T09:21:28Z) - A Comparative Evaluation Of Transformer Models For De-Identification Of
Clinical Text Data [0.0]
The i2b2/UTHealth 2014 clinical text de-identification challenge corpus contains N=1304 clinical notes.
We fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge.
We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score.
arXiv Detail & Related papers (2022-03-25T19:42:03Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models.
We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student.
Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.