Neural Machine Translation Models Can Learn to be Few-shot Learners
- URL: http://arxiv.org/abs/2309.08590v1
- Date: Fri, 15 Sep 2023 17:44:21 GMT
- Title: Neural Machine Translation Models Can Learn to be Few-shot Learners
- Authors: Raphael Reinauer and Patrick Simianer and Kaden Uhlig and Johannes E.
M. Mosig and Joern Wuebker
- Abstract summary: We show that a much smaller model can be trained to perform in-context learning (ICL)
With this capacity for ICL, the model can take advantage of relevant few-shot examples to adapt its output towards the domain.
Our approach allows efficient batch inference on a mix of domains and outperforms state-of-the-art baselines in terms of both translation quality and immediate adaptation rate.
- Score: 2.2999148299770042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergent ability of Large Language Models to use a small number of
examples to learn to perform in novel domains and tasks, also called in-context
learning (ICL). In this work, we show that a much smaller model can be trained
to perform ICL by fine-tuning towards a specialized training objective,
exemplified on the task of domain adaptation for neural machine translation.
With this capacity for ICL, the model can take advantage of relevant few-shot
examples to adapt its output towards the domain. We compare the quality of this
domain adaptation to traditional supervised techniques and ICL with a
40B-parameter Large Language Model. Our approach allows efficient batch
inference on a mix of domains and outperforms state-of-the-art baselines in
terms of both translation quality and immediate adaptation rate, i.e. the
ability to reproduce a specific term after being shown a single example.
Related papers
- Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts.
Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models.
The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z) - Mitigating Catastrophic Forgetting in Language Transfer via Model Merging [16.845734486667226]
Branch-and-Merge (BaM) is a new adaptation method based on iteratively merging multiple models.
BaM is based on the insight that this yields lower magnitude but higher quality weight changes.
We demonstrate in an empirical study on Bulgarian and German that BaM can significantly reduce forgetting while matching or even improving target domain performance.
arXiv Detail & Related papers (2024-07-11T17:32:40Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations [47.89819316477715]
We exploit In-Context Tuning (ICT) for One-Shot Cross-lingual transfer in the classification task by introducing In-Context Cross-lingual Transfer (IC-XLT)
The novel concept involves training a model to learn from context examples and subsequently adapting it during inference to a target language by prepending a One-Shot context demonstration in that language.
Our results show that IC-XLT successfully leverages target-language examples to improve the cross-lingual capabilities of the evaluated mT5 model, outperforming prompt-based models in the Zero and Few-shot scenarios adapted through fine-tuning.
arXiv Detail & Related papers (2024-04-03T04:40:57Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - CLIN-X: pre-trained language models and a study on cross-task transfer
for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models.
Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available.
Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z) - Adapt-and-Distill: Developing Small, Fast and Effective Pretrained
Language Models for Domains [45.07506437436464]
We present a general approach to developing small, fast and effective pre-trained models for specific domains.
This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains.
arXiv Detail & Related papers (2021-06-25T07:37:05Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - CALM: Continuous Adaptive Learning for Language Modeling [18.72860206714457]
Training large language representation models has become a standard in the natural language processing community.
We demonstrate that in practice these pre-trained models present performance deterioration in the form of catastrophic forgetting.
We propose CALM, Continuous Adaptive Learning for Language Modeling: techniques to render models which retain knowledge across multiple domains.
arXiv Detail & Related papers (2020-04-08T03:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.