On Long-Tailed Phenomena in Neural Machine Translation
- URL: http://arxiv.org/abs/2010.04924v1
- Date: Sat, 10 Oct 2020 07:00:57 GMT
- Title: On Long-Tailed Phenomena in Neural Machine Translation
- Authors: Vikas Raunak, Siddharth Dalmia, Vivek Gupta and Florian Metze
- Abstract summary: State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
- Score: 50.65273145888896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art Neural Machine Translation (NMT) models struggle with
generating low-frequency tokens, tackling which remains a major challenge. The
analysis of long-tailed phenomena in the context of structured prediction tasks
is further hindered by the added complexities of search during inference. In
this work, we quantitatively characterize such long-tailed phenomena at two
levels of abstraction, namely, token classification and sequence generation. We
propose a new loss function, the Anti-Focal loss, to better adapt model
training to the structural dependencies of conditional text generation by
incorporating the inductive biases of beam search in the training process. We
show the efficacy of the proposed technique on a number of Machine Translation
(MT) datasets, demonstrating that it leads to significant gains over
cross-entropy across different language pairs, especially on the generation of
low-frequency words. We have released the code to reproduce our results.
Related papers
- A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language [15.929767234646631]
Increase in data, size, or compute can lead to sudden learning of specific capabilities by a neural network.
"emergence" is a phenomenon often called "emergence"
arXiv Detail & Related papers (2024-08-22T17:44:22Z) - Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary
Capabilities and Robustness of Char-Based Models [6.123324869194193]
This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC)
We first study the detrimental impact on translation performance of various user-generated content phenomena on a small annotated dataset.
We show that such models are indeed incapable of handling unknown letters, which leads to catastrophic translation failure once such characters are encountered.
arXiv Detail & Related papers (2021-10-24T23:25:54Z) - Comparative Error Analysis in Neural and Finite-state Models for
Unsupervised Character-level Transduction [34.1177259741046]
We compare the two model classes side by side and find that they tend to make different types of errors even when achieving comparable performance.
We investigate how combining finite-state and sequence-to-sequence models at decoding time affects the output quantitatively and qualitatively.
arXiv Detail & Related papers (2021-06-24T00:09:24Z) - Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences
on Neural Machine Translation [14.645468999921961]
We analyze the impact of different types of fine-grained semantic divergences on Transformer models.
We introduce a divergent-aware NMT framework that uses factors to help NMT recover from the degradation caused by naturally occurring divergences.
arXiv Detail & Related papers (2021-05-31T16:15:35Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Sentence Boundary Augmentation For Neural Machine Translation Robustness [11.290581889247983]
We show that sentence boundary segmentation has the largest impact on quality, and we develop a simple data augmentation strategy to improve segmentation robustness.
We show that sentence boundary segmentation has the largest impact on quality, and we develop a simple data augmentation strategy to improve segmentation robustness.
arXiv Detail & Related papers (2020-10-21T16:44:48Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.