Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral
- URL: http://arxiv.org/abs/2108.11346v1
- Date: Wed, 25 Aug 2021 17:09:48 GMT
- Title: Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral
- Authors: Lucio M. Dery, Yann Dauphin and David Grangier
- Abstract summary: We formulate a model-agnostic framework that performs fine-grained manipulation of the auxiliary task gradients.
We propose to decompose auxiliary updates into directions which help, damage or leave the primary task loss unchanged.
Our approach consistently outperforms strong and widely used baselines when leveraging out-of-distribution data for Text and Image classification tasks.
- Score: 18.387162887917164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep learning has been very beneficial in data-rich settings, tasks
with smaller training set often resort to pre-training or multitask learning to
leverage data from other tasks. In this case, careful consideration is needed
to select tasks and model parameterizations such that updates from the
auxiliary tasks actually help the primary task. We seek to alleviate this
burden by formulating a model-agnostic framework that performs fine-grained
manipulation of the auxiliary task gradients. We propose to decompose auxiliary
updates into directions which help, damage or leave the primary task loss
unchanged. This allows weighting the update directions differently depending on
their impact on the problem of interest. We present a novel and efficient
algorithm for that purpose and show its advantage in practice. Our method
leverages efficient automatic differentiation procedures and randomized
singular value decomposition for scalability. We show that our framework is
generic and encompasses some prior work as particular cases. Our approach
consistently outperforms strong and widely used baselines when leveraging
out-of-distribution data for Text and Image classification tasks.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Transfer Learning for Structured Pruning under Limited Task Data [15.946734013984184]
We propose a framework which combines structured pruning with transfer learning to reduce the need for task-specific data.
We demonstrate that our framework results in pruned models with improved generalization over strong baselines.
arXiv Detail & Related papers (2023-11-10T20:23:35Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem"
We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients.
We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - Adaptive Transfer Learning on Graph Neural Networks [4.233435459239147]
Graph neural networks (GNNs) are widely used to learn a powerful representation of graph-structured data.
Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation.
We propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task.
arXiv Detail & Related papers (2021-07-19T11:46:28Z) - TAG: Task-based Accumulated Gradients for Lifelong learning [21.779858050277475]
We propose a task-aware system that adapts the learning rate based on the relatedness among tasks.
We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
arXiv Detail & Related papers (2021-05-11T16:10:32Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Auxiliary Task Reweighting for Minimum-data Learning [118.69683270159108]
Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce.
To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task.
We propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task.
arXiv Detail & Related papers (2020-10-16T08:45:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.