Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
- URL: http://arxiv.org/abs/2303.12892v2
- Date: Sat, 25 May 2024 14:54:20 GMT
- Title: Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
- Authors: Thanh-Dung Le, Philippe Jouvet, Rita Noumeir,
- Abstract summary: Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts.
This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital.
- Score: 0.08192907805418582
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts, especially with constrained computational resources. This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital. The MoE-Transformer addresses the dual challenges of effective training with limited data and low-resource computation suitable for in-house hospital use. Despite the success of biomedical pre-trained models such as CamemBERT-bio, DrBERT, and AliBERT, their high computational demands make them impractical for many clinical settings. Our MoE-Transformer model not only outperforms DistillBERT, CamemBERT, FlauBERT, and Transformer models on the same dataset but also achieves impressive results: an accuracy of 87\%, precision of 87\%, recall of 85\%, and F1-score of 86\%. While the MoE-Transformer does not surpass the performance of biomedical pre-trained BERT models, it can be trained at least 190 times faster, offering a viable alternative for settings with limited data and computational resources. Although the MoE-Transformer addresses challenges of generalization gaps and sharp minima, demonstrating some limitations for efficient and accurate clinical text classification, this model still represents a significant advancement in the field. It is particularly valuable for classifying small French clinical narratives within the privacy and constraints of hospital-based computational resources.
Related papers
- The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations [4.72457683445805]
Fine-tuning Large Language Models (LLMs) for clinical Natural Language Processing (NLP) poses significant challenges due to the domain gap and limited data availability.
This study investigates the effectiveness of various adapter techniques, equivalent to Low-Rank Adaptation (LoRA)
We fine-tuned biomedical pre-trained models, including CamemBERT-bio, AliBERT, and DrBERT, alongside two Transformer-based models.
arXiv Detail & Related papers (2024-07-27T16:48:03Z) - Multi-objective Representation for Numbers in Clinical Narratives: A CamemBERT-Bio-Based Alternative to Large-Scale LLMs [0.9208007322096533]
This paper investigates the limitations of Transformer models in understanding numerical values.
It aims to categorize numerical values extracted from medical documents into eight specific physiological categories using CamemBERT-bio.
arXiv Detail & Related papers (2024-05-28T01:15:21Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Lightweight Transformers for Clinical Natural Language Processing [9.532776962985828]
This study focuses on development of compact language models for processing clinical texts.
We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning.
Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
arXiv Detail & Related papers (2023-02-09T16:07:31Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z) - Learning structures of the French clinical language:development and
validation of word embedding models using 21 million clinical reports from
electronic health records [2.5709272341038027]
Methods based on transfer learning using pre-trained language models have achieved state-of-the-art results in most NLP applications.
We aimed to evaluate the impact of adapting a language model to French clinical reports on downstream medical NLP tasks.
arXiv Detail & Related papers (2022-07-26T14:46:34Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - A Comparative Evaluation Of Transformer Models For De-Identification Of
Clinical Text Data [0.0]
The i2b2/UTHealth 2014 clinical text de-identification challenge corpus contains N=1304 clinical notes.
We fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge.
We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score.
arXiv Detail & Related papers (2022-03-25T19:42:03Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? [70.3631443249802]
We design a battery of approaches intended to recover Personal Health Information from a trained BERT.
Specifically, we attempt to recover patient names and conditions with which they are associated.
We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR.
arXiv Detail & Related papers (2021-04-15T20:40:05Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.