AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain
- URL: http://arxiv.org/abs/2110.13434v1
- Date: Tue, 26 Oct 2021 06:26:01 GMT
- Title: AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain
- Authors: Jimin Hong, Taehee Kim, Hyesu Lim and Jaegul Choo
- Abstract summary: We propose to consider the vocabulary as an optimizable parameter, allowing us to update the vocabulary by expanding it with domain-specific vocabulary.
We preserve the embeddings of the added words from overfitting to downstream data by utilizing knowledge learned from a pretrained language model with a regularization term.
- Score: 17.115865763783336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During the fine-tuning phase of transfer learning, the pretrained vocabulary
remains unchanged, while model parameters are updated. The vocabulary generated
based on the pretrained data is suboptimal for downstream data when domain
discrepancy exists. We propose to consider the vocabulary as an optimizable
parameter, allowing us to update the vocabulary by expanding it with
domain-specific vocabulary based on a tokenization statistic. Furthermore, we
preserve the embeddings of the added words from overfitting to downstream data
by utilizing knowledge learned from a pretrained language model with a
regularization term. Our method achieved consistent performance improvements on
diverse domains (i.e., biomedical, computer science, news, and reviews).
Related papers
- Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model [0.0]
OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains.
We propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters.
arXiv Detail & Related papers (2024-10-24T01:58:11Z) - An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords.
We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z) - Unsupervised Domain Adaptation for Sparse Retrieval by Filling
Vocabulary and Word Frequency Gaps [12.573927420408365]
IR models using a pretrained language model significantly outperform lexical approaches like BM25.
This paper proposes an unsupervised domain adaptation method by filling vocabulary and word-frequency gaps.
We show that our method outperforms the present stateof-the-art domain adaptation method.
arXiv Detail & Related papers (2022-11-08T03:58:26Z) - Suffix Retrieval-Augmented Language Modeling [1.8710230264817358]
Causal language modeling (LM) uses word history to predict the next word.
BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions.
We propose a novel model that simulates a bi-directional contextual effect in an autoregressive manner.
arXiv Detail & Related papers (2022-11-06T07:53:19Z) - TransDrift: Modeling Word-Embedding Drift using Transformer [8.707217592903735]
We propose TransDrift, a transformer-based prediction model for word embeddings.
Our model accurately learns the dynamics of the embedding drift and predicts the future embedding.
Our embeddings lead to superior performance compared to the previous methods.
arXiv Detail & Related papers (2022-06-16T10:48:26Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.