Learning to Multi-Task Learn for Better Neural Machine Translation
- URL: http://arxiv.org/abs/2001.03294v1
- Date: Fri, 10 Jan 2020 03:12:28 GMT
- Title: Learning to Multi-Task Learn for Better Neural Machine Translation
- Authors: Poorya Zaremoodi, Gholamreza Haffari
- Abstract summary: Multi-task learning is an elegant approach to inject linguistic-related biases into neural machine translation models.
We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the biased-MTL setting of interest.
Experiments show the resulting automatically learned training schedulers are competitive with the best, and lead to up to +1.1 BLEU score improvements.
- Score: 53.06405021125476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scarcity of parallel sentence pairs is a major challenge for training high
quality neural machine translation (NMT) models in bilingually low-resource
scenarios, as NMT is data-hungry. Multi-task learning is an elegant approach to
inject linguistic-related inductive biases into NMT, using auxiliary syntactic
and semantic tasks, to improve generalisation. The challenge, however, is to
devise effective training schedules, prescribing when to make use of the
auxiliary tasks during the training process to fill the knowledge gaps of the
main translation task, a setting referred to as biased-MTL. Current approaches
for the training schedule are based on hand-engineering heuristics, whose
effectiveness vary in different MTL settings. We propose a novel framework for
learning the training schedule, ie learning to multi-task learn, for the MTL
setting of interest. We formulate the training schedule as a Markov decision
process which paves the way to employ policy learning methods to learn the
scheduling policy. We effectively and efficiently learn the training schedule
policy within the imitation learning framework using an oracle policy algorithm
that dynamically sets the importance weights of auxiliary tasks based on their
contributions to the generalisability of the main NMT task. Experiments on
low-resource NMT settings show the resulting automatically learned training
schedulers are competitive with the best heuristics, and lead to up to +1.1
BLEU score improvements.
Related papers
- MT2ST: Adaptive Multi-Task to Single-Task Learning [7.307436175842646]
Multi-Task to Single-Task (MT2ST) is a novel approach that significantly enhances the efficiency and accuracy of word embedding training.
Our empirical studies demonstrate that MT2ST can reduce training time by 67% when contrasted with single-task learning approaches.
arXiv Detail & Related papers (2024-06-26T03:12:07Z) - From Instance Training to Instruction Learning: Task Adapters Generation from Instructions [29.452006810725184]
This paper focuses on simulating human learning to address the shortcomings of instance training.
We introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model.
We evaluate TAGI on the Super-Natural Instructions and P3 datasets.
arXiv Detail & Related papers (2024-06-18T08:14:28Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Preference-grounded Token-level Guidance for Language Model Fine-tuning [105.88789610320426]
Aligning language models with preferences is an important problem in natural language generation.
For LM training, based on the amount of supervised data, we present two *minimalist* learning objectives that utilize the learned guidance.
In experiments, our method performs competitively on two distinct representative LM tasks.
arXiv Detail & Related papers (2023-06-01T07:00:07Z) - Improving Multi-task Learning via Seeking Task-based Flat Regions [38.28600737969538]
Multi-Task Learning (MTL) is a powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone.
There is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction.
We propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning.
arXiv Detail & Related papers (2022-11-24T17:19:30Z) - A Survey of Multi-task Learning in Natural Language Processing:
Regarding Task Relatedness and Training Methods [17.094426577723507]
Multi-task learning (MTL) has become increasingly popular in natural language processing (NLP)
It improves the performance of related tasks by exploiting their commonalities and differences.
It is still not understood very well how multi-task learning can be implemented based on the relatedness of training tasks.
arXiv Detail & Related papers (2022-04-07T15:22:19Z) - Learning Domain Specific Language Models for Automatic Speech
Recognition through Machine Translation [0.0]
We use Neural Machine Translation as an intermediate step to first obtain translations of task-specific text data.
We develop a procedure to derive word confusion networks from NMT beam search graphs.
We demonstrate that NMT confusion networks can help to reduce the perplexity of both n-gram and recurrent neural network LMs.
arXiv Detail & Related papers (2021-09-21T10:29:20Z) - Self-Paced Learning for Neural Machine Translation [55.41314278859938]
We propose self-paced learning for neural machine translation (NMT) training.
We show that the proposed model yields better performance than strong baselines.
arXiv Detail & Related papers (2020-10-09T11:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.