MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization
- URL: http://arxiv.org/abs/2405.04163v2
- Date: Sat, 17 Aug 2024 12:43:13 GMT
- Title: MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization
- Authors: Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly,
- Abstract summary: This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained language models (PLMs)
In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter.
Our human evaluation shows MEDVOC generates more faithful medical summaries.
- Score: 26.442558912559658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained language models (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time -- from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) -- bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines). We make the codebase publicly available at https://github.com/gb-kgp/MEDVOC.
Related papers
- Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models [26.442558912559658]
We show a fundamental limitation in vocabulary adaptation approaches that use Byte-Pair.
(BPE) tokenization scheme for fine-tuning pretrained language models (PLMs) to expert domains.
We propose AdaptBPE where the BPE tokenization phase is modified to first perform the longest string matching on the added (target) vocabulary before tokenizing at the character level.
arXiv Detail & Related papers (2024-10-04T09:24:55Z) - Enhancing Medical Specialty Assignment to Patients using NLP Techniques [0.0]
We propose an alternative approach that achieves superior performance while being computationally efficient.
Specifically, we utilize keywords to train a deep learning architecture that outperforms a language model pretrained on a large corpus of text.
Our results demonstrate that utilizing keywords for text classification significantly improves classification performance.
arXiv Detail & Related papers (2023-12-09T14:13:45Z) - OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining [49.213120730582354]
Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining.
We propose a novel framework: $textbfO$ne $textbfF$or $textbfA$ll, which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively.
arXiv Detail & Related papers (2023-11-15T10:40:45Z) - SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural
Machine Translation [51.881877192924414]
Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT)
This paper introduces SelfSeg, a self-supervised neural sub-word segmentation method.
SelfSeg is much faster to train/decode and requires only monolingual dictionaries instead of parallel corpora.
arXiv Detail & Related papers (2023-07-31T04:38:47Z) - Unsupervised Domain Adaptation for Sparse Retrieval by Filling
Vocabulary and Word Frequency Gaps [12.573927420408365]
IR models using a pretrained language model significantly outperform lexical approaches like BM25.
This paper proposes an unsupervised domain adaptation method by filling vocabulary and word-frequency gaps.
We show that our method outperforms the present stateof-the-art domain adaptation method.
arXiv Detail & Related papers (2022-11-08T03:58:26Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain [17.115865763783336]
We propose to consider the vocabulary as an optimizable parameter, allowing us to update the vocabulary by expanding it with domain-specific vocabulary.
We preserve the embeddings of the added words from overfitting to downstream data by utilizing knowledge learned from a pretrained language model with a regularization term.
arXiv Detail & Related papers (2021-10-26T06:26:01Z) - Span Fine-tuning for Pre-trained Language Models [43.352833140317486]
This paper presents a novel span fine-tuning method for PrLMs.
Any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary.
Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM.
arXiv Detail & Related papers (2021-08-29T14:11:38Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.