Leveraging Auxiliary Domain Parallel Data in Intermediate Task
Fine-tuning for Low-resource Translation
- URL: http://arxiv.org/abs/2306.01382v2
- Date: Sun, 24 Sep 2023 01:33:13 GMT
- Title: Leveraging Auxiliary Domain Parallel Data in Intermediate Task
Fine-tuning for Low-resource Translation
- Authors: Shravan Nayak, Surangika Ranathunga, Sarubi Thillainathan, Rikki Hung,
Anthony Rinaldi, Yining Wang, Jonah Mackey, Andrew Ho, En-Shiun Annie Lee
- Abstract summary: Intermediate-task fine-tuning (ITFT) of PMSS models is extremely beneficial for domain-specific NMT.
We quantify the domain-specific results variations using a domain-divergence test, and show that ITFT can mitigate the impact of domain divergence to some extent.
- Score: 6.583246002638354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: NMT systems trained on Pre-trained Multilingual Sequence-Sequence (PMSS)
models flounder when sufficient amounts of parallel data is not available for
fine-tuning. This specifically holds for languages missing/under-represented in
these models. The problem gets aggravated when the data comes from different
domains. In this paper, we show that intermediate-task fine-tuning (ITFT) of
PMSS models is extremely beneficial for domain-specific NMT, especially when
target domain data is limited/unavailable and the considered languages are
missing or under-represented in the PMSS model. We quantify the domain-specific
results variations using a domain-divergence test, and show that ITFT can
mitigate the impact of domain divergence to some extent.
Related papers
- Exploiting Domain-Specific Parallel Data on Multilingual Language Models for Low-resource Language Translation [0.6467856992131628]
We present an evaluation of the effectiveness of parallel data from auxiliary domains in building domain-specific NMT models.
We explore the impact of domain divergence on NMT model performance.
We recommend several strategies for utilizing auxiliary parallel data in building domain-specific NMT models.
arXiv Detail & Related papers (2024-12-27T08:25:52Z) - Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning [55.107329995417786]
Large language models (LLMs) have demonstrated impressive general understanding and generation abilities.
We establish a benchmark for multi-domain translation, featuring 25 German$Leftrightarrow$English and 22 Chinese$Leftrightarrow$English test sets.
We propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance.
arXiv Detail & Related papers (2024-10-03T16:15:04Z) - Language Modelling Approaches to Adaptive Machine Translation [0.0]
Consistency is a key requirement of high-quality translation.
In-domain data scarcity is common in translation settings.
Can we employ language models to improve the quality of adaptive MT at inference time?
arXiv Detail & Related papers (2024-01-25T23:02:54Z) - SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation [62.889835139583965]
We introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data.
As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data.
Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.
arXiv Detail & Related papers (2023-04-06T17:36:23Z) - Robust Domain Adaptation for Pre-trained Multilingual Neural Machine
Translation Models [0.0]
We propose a fine-tuning procedure for the generic mNMT that combines embeddings freezing and adversarial loss.
Experiments demonstrated that the procedure improves performances on specialized data with a minimal loss in initial performances on generic domain for all languages pairs.
arXiv Detail & Related papers (2022-10-26T18:47:45Z) - Translation Transformers Rediscover Inherent Data Domains [0.0]
We analyze the sentence representations learned by NMT Transformers and show that these explicitly include the information on text domains.
We show that this internal information is enough to cluster sentences by their underlying domains without supervision.
We show that NMT models produce clusters better aligned to the actual domains compared to pre-trained language models (LMs)
arXiv Detail & Related papers (2021-09-16T10:58:13Z) - Learning Domain Invariant Representations by Joint Wasserstein Distance
Minimization [3.382067152367334]
Domain shifts in the training data are common in practical applications of machine learning.
Ideally, a ML model should work well independently of these shifts, for example, by learning a domain-invariant representation.
Common ML losses do not give strong guarantees on how consistently the ML model performs for different domains.
arXiv Detail & Related papers (2021-06-09T09:08:51Z) - FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine
Translation [53.87731008029645]
We present a real-world fine-grained domain adaptation task in machine translation (FDMT)
The FDMT dataset consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone.
We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task.
arXiv Detail & Related papers (2020-12-31T17:15:09Z) - Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.
We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data.
Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z) - Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision.
We propose domain data selection methods based on such models.
We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z) - A Simple Baseline to Semi-Supervised Domain Adaptation for Machine
Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data.
We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT.
This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.