Refining the state-of-the-art in Machine Translation, optimizing NMT for
the JA <-> EN language pair by leveraging personal domain expertise
- URL: http://arxiv.org/abs/2202.11669v1
- Date: Wed, 23 Feb 2022 18:20:14 GMT
- Title: Refining the state-of-the-art in Machine Translation, optimizing NMT for
the JA <-> EN language pair by leveraging personal domain expertise
- Authors: Matthew Bieda
- Abstract summary: Documenting the construction of an NMT (Neural Machine Translation) system for En/Ja based on the Transformer architecture leveraging the OpenNMT framework.
System is evaluated using standard auto-evaluation metrics such as BLEU, and my subjective opinion as a Japanese linguist.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Documenting the construction of an NMT (Neural Machine Translation) system
for En/Ja based on the Transformer architecture leveraging the OpenNMT
framework. A systematic exploration of corpora pre-processing, hyperparameter
tuning and model architecture is carried out to obtain optimal performance. The
system is evaluated using standard auto-evaluation metrics such as BLEU, and my
subjective opinion as a Japanese linguist.
Related papers
- Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization [4.993565079216378]
We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT.
We introduce Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences.
arXiv Detail & Related papers (2024-09-26T09:32:12Z) - Human Evaluation of English--Irish Transformer-Based NMT [2.648836772989769]
Best-performing Transformer system significantly reduces both accuracy and errors when compared with an RNN-based model.
When benchmarked against Google Translate, our translation engines demonstrated significant improvements.
arXiv Detail & Related papers (2024-03-04T11:45:46Z) - IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing
Interactive Machine Translation Systems [94.39110258587887]
We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform.
IMTLab treats the whole interactive translation process as a task-oriented dialogue with a human-in-the-loop setting.
arXiv Detail & Related papers (2023-10-17T11:29:04Z) - Statistical Machine Translation for Indic Languages [1.8899300124593648]
This paper canvasses about the development of bilingual Statistical Machine Translation models.
To create the system, MOSES open-source SMT toolkit is explored.
In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES.
arXiv Detail & Related papers (2023-01-02T06:23:12Z) - Domain Adaptation in Neural Machine Translation using a Qualia-Enriched
FrameNet [64.0476282000118]
We present Scylla, a methodology for domain adaptation of Neural Machine Translation (NMT) systems.
Two versions of Scylla are presented: one using the source sentence as input, and another one using the target sentence.
We evaluate Scylla in comparison to a state-of-the-art commercial NMT system in an experiment in which 50 sentences from the Sports domain are translated from Brazilian Portuguese to English.
arXiv Detail & Related papers (2022-02-21T15:05:23Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Machine Translation Customization via Automatic Training Data Selection
from the Web [97.98885151955467]
We describe an approach for customizing machine translation systems on specific domains.
We select data similar to the target customer data to train neural translation models.
Finally, we train MT models on our automatically selected data, obtaining a system specialized to the target domain.
arXiv Detail & Related papers (2021-02-20T03:29:41Z) - Machine Translation of Novels in the Age of Transformer [1.6453685972661827]
We build a machine translation system tailored to the literary domain, specifically to novels, based on the state-of-the-art architecture in neural MT (NMT), the Transformer, for the translation direction English-to-Catalan.
We compare this MT system against three other systems (two domain-specific systems under the recurrent and phrase-based paradigms and a popular generic on-line system) on three evaluations.
As expected, the domain-specific Transformer-based system outperformed the three other systems in all the three evaluations conducted, in all cases by a large margin.
arXiv Detail & Related papers (2020-11-30T16:51:08Z) - Document-level Neural Machine Translation with Document Embeddings [82.4684444847092]
This work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings.
The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end.
arXiv Detail & Related papers (2020-09-16T19:43:29Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.