Related papers: The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation

The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation

URL: http://arxiv.org/abs/2205.06618v1
Date: Fri, 13 May 2022 13:13:03 GMT
Title: The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation
Authors: Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne, Felix Hieber
Abstract summary: We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest 2020 and idiomatic expressions.
Score: 12.207265136294678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with respect to automatic quality metrics in prior work, we show that it can fail to select the right set of output words, particularly for semantically non-compositional linguistic phenomena such as idiomatic expressions, leading to reduced translation quality as perceived by humans. Trading off latency for quality by increasing the size of the allowed set is often not an option in real-world scenarios. We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest2020 and idiomatic expressions, at an inference latency competitive with alignment-based selection using aggressive thresholds, thereby removing the dependency on separately trained alignment models.

Related papers

Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
We introduce ADSQE, a novel framework for alleviating distribution shift in synthetic QE data. ADSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes. Experiments demonstrate that ADSQE outperforms SOTA baselines like COMET in both supervised and unsupervised settings.
arXiv Detail & Related papers (2025-02-27T10:11:53Z)
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation [13.70446799743065]
Byte-based machine translation systems have shown significant potential in massively multilingual settings. Unicode encoding maps each character to specific byte(s) eliminating the emergence of unknown words, even in new languages. Local contextualization has proven effective in assigning initial semantics to tokens, improving sentence comprehension. We propose Mixture of Contextualization Experts (MoCE), adaptively selecting and mixing attention heads, which are treated as contextualization experts.
arXiv Detail & Related papers (2024-11-03T08:15:43Z)
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model [0.0]
OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains. We propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters.
arXiv Detail & Related papers (2024-10-24T01:58:11Z)
HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z)
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features [13.44542301438426]
We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information. Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach.
arXiv Detail & Related papers (2022-12-12T10:03:10Z)
Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step. We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z)
A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation. The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z)
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource. Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation [3.3194866396158]
We propose a simple generative noise model to generate adversarial examples of ten different types. We show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data.
arXiv Detail & Related papers (2020-09-11T14:12:54Z)
Neural Simultaneous Speech Translation Using Alignment-Based Chunking [4.224809458327515]
In simultaneous machine translation, the objective is to determine when to produce a partial translation given a continuous stream of source words. We propose a neural machine translation (NMT) model that makes dynamic decisions when to continue feeding on input or generate output words. Our results on the IWSLT 2020 English-to-German task outperform a wait-k baseline by 2.6 to 3.7% BLEU absolute.
arXiv Detail & Related papers (2020-05-29T10:20:48Z)
Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model. First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model. Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z)
Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model. We derive the bias score for every word in the system vocabulary from the training corpus. We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.