Improving Non-autoregressive Machine Translation with Error Exposure and
Consistency Regularization
- URL: http://arxiv.org/abs/2402.09725v1
- Date: Thu, 15 Feb 2024 05:35:04 GMT
- Title: Improving Non-autoregressive Machine Translation with Error Exposure and
Consistency Regularization
- Authors: Xinran Chen, Sufeng Duan, Gongshen Liu
- Abstract summary: Conditional Masked Language Model (CMLM) adopts the mask-predict paradigm to re-predict the masked low-confidence tokens.
CMLM suffers from the data distribution discrepancy between training and inference.
We construct mixed sequences based on model prediction during training, and propose to optimize over the masked tokens under imperfect observation conditions.
- Score: 13.38986769508059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Being one of the IR-NAT (Iterative-refinemennt-based NAT) frameworks, the
Conditional Masked Language Model (CMLM) adopts the mask-predict paradigm to
re-predict the masked low-confidence tokens. However, CMLM suffers from the
data distribution discrepancy between training and inference, where the
observed tokens are generated differently in the two cases. In this paper, we
address this problem with the training approaches of error exposure and
consistency regularization (EECR). We construct the mixed sequences based on
model prediction during training, and propose to optimize over the masked
tokens under imperfect observation conditions. We also design a consistency
learning method to constrain the data distribution for the masked tokens under
different observing situations to narrow down the gap between training and
inference. The experiments on five translation benchmarks obtains an average
improvement of 0.68 and 0.40 BLEU scores compared to the base models,
respectively, and our CMLMC-EECR achieves the best performance with a
comparable translation quality with the Transformer. The experiments results
demonstrate the effectiveness of our method.
Related papers
- PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE is a self-supervised learning framework that enhances global feature representation of point cloud mask autoencoders.
We show that PseudoNeg-MAE achieves state-of-the-art performance on the ModelNet40 and ScanObjectNN datasets.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis [0.74454067778951]
We introduce MaCoDE by redefining the consecutive multi-class classification task of Masked Modeling (MLM) as histogram-based conditional density estimation.
Our approach enables the estimation of conditional densities across arbitrary combinations of target and conditional variables.
To validate our proposed model, we evaluate its performance in synthetic data generation across 10 real-world datasets.
arXiv Detail & Related papers (2024-05-31T03:26:42Z) - Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations.
We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Pre-training Language Model as a Multi-perspective Course Learner [103.17674402415582]
This study proposes a multi-perspective course learning (MCL) method for sample-efficient pre-training.
In this study, three self-supervision courses are designed to alleviate inherent flaws of "tug-of-war" dynamics.
Our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks.
arXiv Detail & Related papers (2023-05-06T09:02:10Z) - Enhancing Text Generation with Cooperative Training [23.971227375706327]
Most prevailing methods trained generative and discriminative models in isolation, which left them unable to adapt to changes in each other.
We introduce a textitself-consistent learning framework in the text field that involves training a discriminator and generator cooperatively in a closed-loop manner.
Our framework are able to mitigate training instabilities such as mode collapse and non-convergence.
arXiv Detail & Related papers (2023-03-16T04:21:19Z) - Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic
Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias.
We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality.
Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z) - MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive
Machine Translation [0.5586191108738562]
Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT)
We introduce Multi-view Subset Regularization (MvSR), a novel regularization method to improve the performance of the NAT model.
We achieve remarkable performance on three public benchmarks with 0.36-1.14 BLEU gains over previous NAT models.
arXiv Detail & Related papers (2021-08-19T02:30:38Z) - On the Inference Calibration of Neural Machine Translation [54.48932804996506]
We study the correlation between calibration and translation performance and linguistic properties of miscalibration.
We propose a new graduated label smoothing method that can improve both inference calibration and translation performance.
arXiv Detail & Related papers (2020-05-03T02:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.