Code-Switching with Word Senses for Pretraining in Neural Machine
Translation
- URL: http://arxiv.org/abs/2310.14050v1
- Date: Sat, 21 Oct 2023 16:13:01 GMT
- Title: Code-Switching with Word Senses for Pretraining in Neural Machine
Translation
- Authors: Vivek Iyer, Edoardo Barba, Alexandra Birch, Jeff Z. Pan, Roberto
Navigli
- Abstract summary: We introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT)
WSP-NMT is an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases.
Our experiments show significant improvements in overall translation quality.
- Score: 107.23743153715799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lexical ambiguity is a significant and pervasive challenge in Neural Machine
Translation (NMT), with many state-of-the-art (SOTA) NMT systems struggling to
handle polysemous words (Campolungo et al., 2022). The same holds for the NMT
pretraining paradigm of denoising synthetic "code-switched" text (Pan et al.,
2021; Iyer et al., 2023), where word senses are ignored in the noising stage --
leading to harmful sense biases in the pretraining data that are subsequently
inherited by the resulting models. In this work, we introduce Word Sense
Pretraining for Neural Machine Translation (WSP-NMT) - an end-to-end approach
for pretraining multilingual NMT models leveraging word sense-specific
information from Knowledge Bases. Our experiments show significant improvements
in overall translation quality. Then, we show the robustness of our approach to
scale to various challenging data and resource-scarce scenarios and, finally,
report fine-grained accuracy improvements on the DiBiMT disambiguation
benchmark. Our studies yield interesting and novel insights into the merits and
challenges of integrating word sense information and structured knowledge in
multilingual pretraining for NMT.
Related papers
- Improving Neural Machine Translation by Multi-Knowledge Integration with
Prompting [36.24578487904221]
We focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting.
We propose a unified framework, which can integrate effectively multiple types of knowledge including sentences, terminologies/phrases and translation templates into NMT models.
arXiv Detail & Related papers (2023-12-08T02:55:00Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Better Neural Machine Translation by Extracting Linguistic Information
from BERT [4.353029347463806]
Adding linguistic information to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models.
We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates.
arXiv Detail & Related papers (2021-04-07T00:03:51Z) - PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on
User-Generated Contents [40.25277134147149]
We present a new dataset, PheMT, for evaluating the robustness of MT systems against specific linguistic phenomena in Japanese-English translation.
Our experiments with the created dataset revealed that not only our in-house models but even widely used off-the-shelf systems are greatly disturbed by the presence of certain phenomena.
arXiv Detail & Related papers (2020-11-04T04:44:47Z) - Neural Machine Translation: Challenges, Progress and Future [62.75523637241876]
Machine translation (MT) is a technique that leverages computers to translate human languages automatically.
neural machine translation (NMT) models direct mapping between source and target languages with deep neural networks.
This article makes a review of NMT framework, discusses the challenges in NMT and introduces some exciting recent progresses.
arXiv Detail & Related papers (2020-04-13T07:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.