A Single Model Ensemble Framework for Neural Machine Translation using Pivot Translation
- URL: http://arxiv.org/abs/2502.01182v1
- Date: Mon, 03 Feb 2025 09:17:45 GMT
- Title: A Single Model Ensemble Framework for Neural Machine Translation using Pivot Translation
- Authors: Seokjin Oh, Keonwoong Noh, Woohwan Jung,
- Abstract summary: We present a pivot-based single model ensemble for low-resource language pairs.
In the first step, we generate candidates through pivot translation.
Next, in the aggregation step, we select k high-quality candidates from the generated candidates and merge them to generate a final translation.
- Score: 1.3791394805787949
- License:
- Abstract: Despite the significant advances in neural machine translation, performance remains subpar for low-resource language pairs. Ensembling multiple systems is a widely adopted technique to enhance performance, often accomplished by combining probability distributions. However, the previous approaches face the challenge of high computational costs for training multiple models. Furthermore, for black-box models, averaging token-level probabilities at each decoding step is not feasible. To address the problems of multi-model ensemble methods, we present a pivot-based single model ensemble. The proposed strategy consists of two steps: pivot-based candidate generation and post-hoc aggregation. In the first step, we generate candidates through pivot translation. This can be achieved with only a single model and facilitates knowledge transfer from high-resource pivot languages, resulting in candidates that are not only diverse but also more accurate. Next, in the aggregation step, we select k high-quality candidates from the generated candidates and merge them to generate a final translation that outperforms the existing candidates. Our experimental results show that our method produces translations of superior quality by leveraging candidates from pivot translation to capture the subtle nuances of the source sentence.
Related papers
- Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies [12.843274390224853]
Real-world tasks, like multimodal translation, often require a combination of these strengths, such as handling both translation and image processing.
We propose a novel zero-shot ensembling strategy that allows for the integration of different models during the decoding phase without the need for additional training.
Our approach re-ranks beams during decoding by combining scores at the word level, using multimodals to predict when a word is completed.
arXiv Detail & Related papers (2024-08-21T04:20:55Z) - Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models [47.91306228406407]
We revisit ways of pivoting through multiple languages.
We propose MaxEns, a novel combination strategy that makes the output biased towards the most confident predictions.
On average, multi-pivot strategies still lag behind using English as a single pivot language.
arXiv Detail & Related papers (2023-11-13T16:15:20Z) - CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - Probabilistic Bilevel Coreset Selection [24.874967723659022]
We propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample.
We develop an efficient solver to the bilevel optimization problem via unbiased policy gradient without trouble of implicit differentiation.
arXiv Detail & Related papers (2023-01-24T09:37:00Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - A Template-based Method for Constrained Neural Machine Translation [100.02590022551718]
We propose a template-based method that can yield results with high translation quality and match accuracy while keeping the decoding speed.
The generation and derivation of the template can be learned through one sequence-to-sequence training framework.
Experimental results show that the proposed template-based methods can outperform several representative baselines in lexically and structurally constrained translation tasks.
arXiv Detail & Related papers (2022-05-23T12:24:34Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.