Self-Induced Curriculum Learning in Self-Supervised Neural Machine
Translation
- URL: http://arxiv.org/abs/2004.03151v2
- Date: Tue, 6 Oct 2020 14:13:45 GMT
- Title: Self-Induced Curriculum Learning in Self-Supervised Neural Machine
Translation
- Authors: Dana Ruiter, Josef van Genabith and Cristina Espa\~na-Bonet
- Abstract summary: SSNMT learns to identify and select suitable training data from comparable (rather than parallel) corpora.
In this study, we provide an in-depth analysis of the sampling choices the SSNMT model makes during training.
We show that in terms of the Gunning-Fog Readability index, SSNMT starts extracting and learning from Wikipedia data suitable for high school students.
- Score: 20.718093208111547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised neural machine translation (SSNMT) jointly learns to identify
and select suitable training data from comparable (rather than parallel)
corpora and to translate, in a way that the two tasks support each other in a
virtuous circle. In this study, we provide an in-depth analysis of the sampling
choices the SSNMT model makes during training. We show how, without it having
been told to do so, the model self-selects samples of increasing (i) complexity
and (ii) task-relevance in combination with (iii) performing a denoising
curriculum. We observe that the dynamics of the mutual-supervision signals of
both system internal representation types are vital for the extraction and
translation performance. We show that in terms of the Gunning-Fog Readability
index, SSNMT starts extracting and learning from Wikipedia data suitable for
high school students and quickly moves towards content suitable for first year
undergraduate students.
Related papers
- The mechanistic basis of data dependence and abrupt learning in an
in-context classification task [0.3626013617212666]
We show that specific distributional properties inherent in language control the trade-off or simultaneous appearance of two forms of learning.
In-context learning is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning.
We propose that the sharp transitions in attention-based networks arise due to a specific chain of multi-layer operations necessary to achieve ICL.
arXiv Detail & Related papers (2023-12-03T20:53:41Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Data Selection Curriculum for Neural Machine Translation [31.55953464971441]
We introduce a two-stage curriculum training framework for NMT models.
We fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring.
We have shown that our curriculum strategies consistently demonstrate better quality (up to +2.2 BLEU improvement) and faster convergence.
arXiv Detail & Related papers (2022-03-25T19:08:30Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Alternated Training with Synthetic and Authentic Data for Neural Machine
Translation [49.35605028467887]
We propose alternated training with synthetic and authentic data for neural machine translation (NMT)
Compared with previous work, we introduce authentic data as guidance to prevent the training of NMT models from being disturbed by noisy synthetic data.
Experiments on Chinese-English and German-English translation tasks show that our approach improves the performance over several strong baselines.
arXiv Detail & Related papers (2021-06-16T07:13:16Z) - A journey in ESN and LSTM visualisations on a language task [77.34726150561087]
We trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task.
The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space.
arXiv Detail & Related papers (2020-12-03T08:32:01Z) - Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training.
We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z) - Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.