Leveraging Synthetic Targets for Machine Translation
- URL: http://arxiv.org/abs/2305.06155v1
- Date: Sun, 7 May 2023 07:42:22 GMT
- Title: Leveraging Synthetic Targets for Machine Translation
- Authors: Sarthak Mittal, Oleksii Hrinchuk, Oleksii Kuchaiev
- Abstract summary: We show that training models on synthetic targets outperforms training on the actual ground-truth data.
We provide preliminary analysis into whether this boost in performance is linked to ease of optimization or more deterministic nature of the predictions.
- Score: 5.302421715411791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we provide a recipe for training machine translation models in
a limited resource setting by leveraging synthetic target data generated using
a large pre-trained model. We show that consistently across different
benchmarks in bilingual, multilingual, and speech translation setups, training
models on synthetic targets outperforms training on the actual ground-truth
data. This performance gap grows bigger with increasing limits on the amount of
available resources in the form of the size of the dataset and the number of
parameters in the model. We also provide preliminary analysis into whether this
boost in performance is linked to ease of optimization or more deterministic
nature of the predictions, and whether this paradigm leads to better
out-of-distribution performance across different testing domains.
Related papers
- Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training [12.29061850090405]
We build upon previous work by replicating existing results on C4 and extending them with our optimized rephrasing pipeline.
Our pipeline leads to increased performance on standard evaluation benchmarks in both the mono- and multilingual setup.
arXiv Detail & Related papers (2024-10-28T07:30:05Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Scaling Parameter-Constrained Language Models with Quality Data [32.35610029333478]
Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters.
We extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation.
arXiv Detail & Related papers (2024-10-04T02:07:17Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - How much pretraining data do language models need to learn syntax? [12.668478784932878]
Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks.
We study the impact of pretraining data size on the knowledge of the models using RoBERTa.
arXiv Detail & Related papers (2021-09-07T15:51:39Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models.
We evaluate our models on domain adaptation, low-resource, and high-resource MT settings.
Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.