Related papers: Encouraging Neural Machine Translation to Satisfy Terminology Constraints

Encouraging Neural Machine Translation to Satisfy Terminology Constraints

URL: http://arxiv.org/abs/2106.03730v1
Date: Mon, 7 Jun 2021 15:46:07 GMT
Title: Encouraging Neural Machine Translation to Satisfy Terminology Constraints
Authors: Melissa Ailem, Jinghsu Liu, Raheel Qader
Abstract summary: We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step.
Score: 3.3108924994485096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step. The proposed method combines three main ingredients. The first one consists in augmenting the training data to specify the constraints. Intuitively, this encourages the model to learn a copy behavior when it encounters constraint terms. Compared to previous work, we use a simplified augmentation strategy without source factors. The second ingredient is constraint token masking, which makes it even easier for the model to learn the copy behavior and generalize better. The third one, is a modification of the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words. Empirical results show that our method improves upon related baselines in terms of both BLEU score and the percentage of generated constraint terms.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting. section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z)
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution. We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z)
Negative Lexical Constraints in Neural Machine Translation [1.3124513975412255]
Negative lexical constraining is used to prohibit certain words or expressions in the translation produced by the neural translation model. We compare various methods based on modifying either the decoding process or the training data. We demonstrate that our method improves the constraining, although the problem still persists in many cases.
arXiv Detail & Related papers (2023-08-07T14:04:15Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer [49.897891031932545]
We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. We conduct an evaluation on two pivot-based machine translation tasks, namely French-German and German-Czech.
arXiv Detail & Related papers (2021-09-27T11:04:09Z)
On the Reproducibility of Neural Network Predictions [52.47827424679645]
We study the problem of churn, identify factors that cause it, and propose two simple means of mitigating it. We first demonstrate that churn is indeed an issue, even for standard image classification tasks. We propose using emphminimum entropy regularizers to increase prediction confidences. We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.
arXiv Detail & Related papers (2021-02-05T18:51:01Z)
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff. cutoff relies on sampling consistency and thus adds little computational overhead. cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z)
Lexically Constrained Neural Machine Translation with Levenshtein Transformer [8.831954614241234]
This paper proposes a simple and effective algorithm for incorporating lexical constraints in neural machine translation. Our method injects terminology constraints at inference time without any impact on decoding speed.
arXiv Detail & Related papers (2020-04-27T09:59:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.