Sequential Reptile: Inter-Task Gradient Alignment for Multilingual
Learning
- URL: http://arxiv.org/abs/2110.02600v1
- Date: Wed, 6 Oct 2021 09:10:10 GMT
- Title: Sequential Reptile: Inter-Task Gradient Alignment for Multilingual
Learning
- Authors: Seanie Lee, Hae Beom Lee, Juho Lee, Sung Ju Hwang
- Abstract summary: We show that it is crucial for tasks to align gradients between them in order to maximize knowledge transfer.
We propose a simple yet effective method that can efficiently align gradients between tasks.
We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks.
- Score: 61.29879000628815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual models jointly pretrained on multiple languages have achieved
remarkable performance on various multilingual downstream tasks. Moreover,
models finetuned on a single monolingual downstream task have shown to
generalize to unseen languages. In this paper, we first show that it is crucial
for those tasks to align gradients between them in order to maximize knowledge
transfer while minimizing negative transfer. Despite its importance, the
existing methods for gradient alignment either have a completely different
purpose, ignore inter-task alignment, or aim to solve continual learning
problems in rather inefficient ways. As a result of the misaligned gradients
between tasks, the model suffers from severe negative transfer in the form of
catastrophic forgetting of the knowledge acquired from the pretraining. To
overcome the limitations, we propose a simple yet effective method that can
efficiently align gradients between tasks. Specifically, we perform each
inner-optimization by sequentially sampling batches from all the tasks,
followed by a Reptile outer update. Thanks to the gradients aligned between
tasks by our method, the model becomes less vulnerable to negative transfer and
catastrophic forgetting. We extensively validate our method on various
multi-task learning and zero-shot cross-lingual transfer tasks, where our
method largely outperforms all the relevant baselines we consider.
Related papers
- No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - TaskMix: Data Augmentation for Meta-Learning of Spoken Intent
Understanding [0.0]
We show that a state-of-the-art data augmentation method worsens this problem of overfitting when the task diversity is low.
We propose a simple method, TaskMix, which synthesizes new tasks by linearly interpolating existing tasks.
We show that TaskMix outperforms baselines, alleviates overfitting when task diversity is low, and does not degrade performance even when it is high.
arXiv Detail & Related papers (2022-09-26T00:37:40Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Gradient Vaccine: Investigating and Improving Multi-task Optimization in
Massively Multilingual Models [63.92643612630657]
This paper attempts to peek into the black-box of multilingual optimization through the lens of loss function geometry.
We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with language proximity.
We derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks.
arXiv Detail & Related papers (2020-10-12T17:26:34Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.