End-to-end Training and Decoding for Pivot-based Cascaded Translation
Model
- URL: http://arxiv.org/abs/2305.02261v1
- Date: Wed, 3 May 2023 16:48:43 GMT
- Title: End-to-end Training and Decoding for Pivot-based Cascaded Translation
Model
- Authors: Hao Cheng, Meng Zhang, Liangyou Li, Qun Liu and Zhihua Zhang
- Abstract summary: We propose an end-to-end training method for the cascaded translation model.
We mitigate the inconsistency between tokens and probability distributions while using beam search in pivot decoding.
- Score: 40.41344631506705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Utilizing pivot language effectively can significantly improve low-resource
machine translation. Usually, the two translation models, source-pivot and
pivot-target, are trained individually and do not utilize the limited (source,
target) parallel data. This work proposes an end-to-end training method for the
cascaded translation model and configures an improved decoding algorithm. The
input of the pivot-target model is modified to weighted pivot embedding based
on the probability distribution output by the source-pivot model. This allows
the model to be trained end-to-end. In addition, we mitigate the inconsistency
between tokens and probability distributions while using beam search in pivot
decoding. Experiments demonstrate that our method enhances the quality of
translation.
Related papers
- Online Speculative Decoding [34.987825705622555]
We introduce online speculative decoding to accelerate the inference of large language models.
The main idea is to continuously update the (multiple) draft model(s) on observed user query data.
We develop a prototype of online speculative decoding based on knowledge distillation and evaluate it using both synthetic and real query data.
arXiv Detail & Related papers (2023-10-11T04:03:42Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Triangular Transfer: Freezing the Pivot for Triangular Machine
Translation [30.655004159965923]
Triangular machine translation is a case where the language pair of interest has limited parallel data.
Key to triangular machine translation is the successful exploitation of such auxiliary data.
We propose a transfer-learning-based approach that utilizes all types of auxiliary data.
arXiv Detail & Related papers (2022-03-17T02:00:40Z) - Towards Reinforcement Learning for Pivot-based Neural Machine
Translation with Non-autoregressive Transformer [49.897891031932545]
Pivot-based neural machine translation (NMT) is commonly used in low-resource setups.
We present an end-to-end pivot-based integrated model, enabling training on source-target data.
arXiv Detail & Related papers (2021-09-27T14:49:35Z) - Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables [28.101782382170306]
We introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions.
We demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance.
arXiv Detail & Related papers (2021-09-10T07:18:53Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.