Learning to Branch for Multi-Task Learning
- URL: http://arxiv.org/abs/2006.01895v2
- Date: Tue, 9 Jun 2020 05:18:55 GMT
- Title: Learning to Branch for Multi-Task Learning
- Authors: Pengsheng Guo, Chen-Yu Lee, Daniel Ulbricht
- Abstract summary: We present an automated multi-task learning algorithm that learns where to share or branch within a network.
We propose a novel tree-structured design space that casts a tree branching operation as a gumbel-softmax sampling procedure.
- Score: 12.49373126819798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training multiple tasks jointly in one deep network yields reduced latency
during inference and better performance over the single-task counterpart by
sharing certain layers of a network. However, over-sharing a network could
erroneously enforce over-generalization, causing negative knowledge transfer
across tasks. Prior works rely on human intuition or pre-computed task
relatedness scores for ad hoc branching structures. They provide sub-optimal
end results and often require huge efforts for the trial-and-error process. In
this work, we present an automated multi-task learning algorithm that learns
where to share or branch within a network, designing an effective network
topology that is directly optimized for multiple objectives across tasks.
Specifically, we propose a novel tree-structured design space that casts a tree
branching operation as a gumbel-softmax sampling procedure. This enables
differentiable network splitting that is end-to-end trainable. We validate the
proposed method on controlled synthetic data, CelebA, and Taskonomy.
Related papers
- ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt [67.8934749027315]
We propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs.
We also propose a novel pre-training paradigm based on a group of $k$-nearest neighbors.
arXiv Detail & Related papers (2023-10-23T12:11:13Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Dynamic Neural Network for Multi-Task Learning Searching across Diverse
Network Topologies [14.574399133024594]
We present a new MTL framework that searches for optimized structures for multiple tasks with diverse graph topologies.
We design a restricted DAG-based central network with read-in/read-out layers to build topologically diverse task-adaptive structures.
arXiv Detail & Related papers (2023-03-13T05:01:50Z) - The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by
Isolating Task-Specific Subnetworks in Feedforward Neural Networks [0.0]
We identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks.
We show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.
arXiv Detail & Related papers (2022-07-18T15:07:13Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - PaRT: Parallel Learning Towards Robust and Transparent AI [4.160969852186451]
This paper takes a parallel learning approach for robust and transparent AI.
A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources.
We show that the network does indeed use learned knowledge from some tasks in other tasks, through shared representations.
arXiv Detail & Related papers (2022-01-24T09:03:28Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Beneficial Perturbation Network for designing general adaptive
artificial intelligence systems [14.226973149346886]
We propose a new type of deep neural network with extra, out-of-network, task-dependent biasing units to accommodate dynamic situations.
Our approach is memory-efficient and parameter-efficient, can accommodate many tasks, and achieves state-of-the-art performance across different tasks and domains.
arXiv Detail & Related papers (2020-09-27T01:28:10Z) - Routing Networks with Co-training for Continual Learning [5.957609459173546]
We propose the use of sparse routing networks for continual learning.
For each input, these network architectures activate a different path through a network of experts.
In practice, we find it is necessary to develop a new training method for routing networks, which we call co-training.
arXiv Detail & Related papers (2020-09-09T15:58:51Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.