Related papers: Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

URL: http://arxiv.org/abs/2201.11838v1
Date: Thu, 27 Jan 2022 22:51:58 GMT
Title: Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences
Authors: Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang and Yuan Luo
Abstract summary: Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. We introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora.
Score: 4.196346055173027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].

Related papers

Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study [4.042419725040222]
Pretrained language models based on BERT have been introduced for the French biomedical domain. These models are constrained by a limited input sequence length of 512 tokens, which poses challenges when applied to clinical notes. We present a comparative study of three adaptation strategies for long-sequence models, leveraging the Longformer architecture.
arXiv Detail & Related papers (2024-02-26T16:05:33Z)
Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z)
Lightweight Transformers for Clinical Natural Language Processing [9.532776962985828]
This study focuses on development of compact language models for processing clinical texts. We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning. Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
arXiv Detail & Related papers (2023-02-09T16:07:31Z)
A Comparative Study of Pretrained Language Models for Long Clinical Text [4.196346055173027]
We introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus. We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks.
arXiv Detail & Related papers (2023-01-27T16:50:29Z)
How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling [37.247872987053654]
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains. This work explores long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record system.
arXiv Detail & Related papers (2022-10-25T09:21:28Z)
A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data [0.0]
The i2b2/UTHealth 2014 clinical text de-identification challenge corpus contains N=1304 clinical notes. We fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge. We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score.
arXiv Detail & Related papers (2022-03-25T19:42:03Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
A multi-stage machine learning model on diagnosis of esophageal manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models. We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student. Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.