Focus on the Target's Vocabulary: Masked Label Smoothing for Machine
Translation
- URL: http://arxiv.org/abs/2203.02889v1
- Date: Sun, 6 Mar 2022 07:01:39 GMT
- Title: Focus on the Target's Vocabulary: Masked Label Smoothing for Machine
Translation
- Authors: Liang Chen, Runxin Xu, Baobao Chang
- Abstract summary: Masked Label Smoothing (MLS) is a new mechanism that masks the soft label probability of source-side words to zero.
Our experiments show that MLS consistently yields improvement over original label smoothing on different datasets.
- Score: 25.781293857729864
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Label smoothing and vocabulary sharing are two widely used techniques in
neural machine translation models. However, we argue that simply applying both
techniques can be conflicting and even leads to sub-optimal performance. When
allocating smoothed probability, original label smoothing treats the
source-side words that would never appear in the target language equally to the
real target-side words, which could bias the translation model. To address this
issue, we propose Masked Label Smoothing (MLS), a new mechanism that masks the
soft label probability of source-side words to zero. Simple yet effective, MLS
manages to better integrate label smoothing with vocabulary sharing. Our
extensive experiments show that MLS consistently yields improvement over
original label smoothing on different datasets, including bilingual and
multilingual translation from both translation quality and model's calibration.
Our code is released at https://github.com/PKUnlp-icler/MLS
Related papers
- LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Improving Multi-lingual Alignment Through Soft Contrastive Learning [9.454626745893798]
We propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model.
Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model.
arXiv Detail & Related papers (2024-05-25T09:46:07Z) - Constrained Decoding for Cross-lingual Label Projection [27.567195418950966]
Cross-lingual transfer using multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data.
However, for NLP tasks that involve fine-grained predictions on words and phrases, the performance of zero-shot cross-lingual transfer learning lags far behind supervised fine-tuning methods.
arXiv Detail & Related papers (2024-02-05T15:57:32Z) - Contextual Label Projection for Cross-Lingual Structured Prediction [103.55999471155104]
CLaP translates text to the target language and performs contextual translation on the labels using the translated text as the context.
We benchmark CLaP with other label projection techniques on zero-shot cross-lingual transfer across 39 languages.
arXiv Detail & Related papers (2023-09-16T10:27:28Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Jam or Cream First? Modeling Ambiguity in Neural Machine Translation
with SCONES [10.785577504399077]
We propose to replace the softmax activation with a multi-label classification layer that can model ambiguity more effectively.
We show that the multi-label output layer can still be trained on single reference training data using the SCONES loss function.
We demonstrate that SCONES can be used to train NMT models that assign the highest probability to adequate translations.
arXiv Detail & Related papers (2022-05-02T07:51:37Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - MELM: Data Augmentation with Masked Entity Language Modeling for
Cross-lingual NER [73.91145686634133]
We propose a data augmentation framework with Masked-Entity Language Modeling (MELM)
MELM linearizes NER labels into sentence context, and thus the fine-tuned MELM is able to predict masked tokens by explicitly conditioning on their labels.
When unlabeled target data is available and MELM can be further applied to augment pseudo-labeled target data, the performance gain reaches 5.7%.
arXiv Detail & Related papers (2021-08-31T07:37:43Z) - Label Mask for Multi-Label Text Classification [6.742627397194543]
We propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model.
On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM)
arXiv Detail & Related papers (2021-06-18T11:54:33Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.