Exploring Benefits of Transfer Learning in Neural Machine Translation
- URL: http://arxiv.org/abs/2001.01622v1
- Date: Mon, 6 Jan 2020 15:11:59 GMT
- Title: Exploring Benefits of Transfer Learning in Neural Machine Translation
- Authors: Tom Kocmi
- Abstract summary: We propose several transfer learning approaches to reuse a model pretrained on a high-resource language pair.
We show how our techniques address specific problems of low-resource languages and are suitable even in high-resource transfer learning.
- Score: 3.7612918175471393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural machine translation is known to require large numbers of parallel
training sentences, which generally prevent it from excelling on low-resource
language pairs. This thesis explores the use of cross-lingual transfer learning
on neural networks as a way of solving the problem with the lack of resources.
We propose several transfer learning approaches to reuse a model pretrained on
a high-resource language pair. We pay particular attention to the simplicity of
the techniques. We study two scenarios: (a) when we reuse the high-resource
model without any prior modifications to its training process and (b) when we
can prepare the first-stage high-resource model for transfer learning in
advance. For the former scenario, we present a proof-of-concept method by
reusing a model trained by other researchers. In the latter scenario, we
present a method which reaches even larger improvements in translation
performance. Apart from proposed techniques, we focus on an in-depth analysis
of transfer learning techniques and try to shed some light on transfer learning
improvements. We show how our techniques address specific problems of
low-resource languages and are suitable even in high-resource transfer
learning. We evaluate the potential drawbacks and behavior by studying transfer
learning in various situations, for example, under artificially damaged
training corpora, or with fixed various model parts.
Related papers
- On the cross-lingual transferability of multilingual prototypical models
across NLU tasks [2.44288434255221]
Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications.
In practice, these approaches suffer from the drawbacks of domain-driven design and under-resourced languages.
This article proposes to investigate the cross-lingual transferability of using synergistically few-shot learning with prototypical neural networks and multilingual Transformers-based models.
arXiv Detail & Related papers (2022-07-19T09:55:04Z) - A Neural Network Based Method with Transfer Learning for Genetic Data
Analysis [3.8599966694228667]
We combine transfer learning technique with a neural network based method(expectile neural networks)
We leverage previous learnings and avoid starting from scratch to improve the model performance.
By using transfer learning algorithm, the performance of expectile neural networks is improved compared to expectile neural network without using transfer learning technique.
arXiv Detail & Related papers (2022-06-20T16:16:05Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Knowledge Efficient Deep Learning for Natural Language Processing [2.2701338128113124]
This thesis focuses on adapting classical methods to modern deep learning models and algorithms.
First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models.
Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision.
arXiv Detail & Related papers (2020-08-28T23:32:33Z) - What is being transferred in transfer learning? [51.6991244438545]
We show that when training from pre-trained weights, the model stays in the same basin in the loss landscape.
We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
arXiv Detail & Related papers (2020-08-26T17:23:40Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z) - Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep
Character Recognition [2.320417845168326]
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models.
The technique of pre-training on one task and then retraining on a new one is called transfer learning.
In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks.
arXiv Detail & Related papers (2020-01-02T14:18:25Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.