Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution
Strategy and Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2003.10159v1
- Date: Mon, 23 Mar 2020 10:21:44 GMT
- Title: Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution
Strategy and Stochastic Gradient Descent
- Authors: Jonas Prellberg, Oliver Kramer
- Abstract summary: We propose an algorithm to learn the assignment between a shared set of weights and task-specific layers.
Learning takes place via a combination of natural evolution strategy and gradient descent.
The end result are task-specific networks that share weights but allow independent inference.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep multi-task learning, weights of task-specific networks are shared
between tasks to improve performance on each single one. Since the question,
which weights to share between layers, is difficult to answer, human-designed
architectures often share everything but a last task-specific layer. In many
cases, this simplistic approach severely limits performance. Instead, we
propose an algorithm to learn the assignment between a shared set of weights
and task-specific layers. To optimize the non-differentiable assignment and at
the same time train the differentiable weights, learning takes place via a
combination of natural evolution strategy and stochastic gradient descent. The
end result are task-specific networks that share weights but allow independent
inference. They achieve lower test errors than baselines and methods from
literature on three multi-task learning datasets.
Related papers
- Multi-Objective Optimization for Sparse Deep Multi-Task Learning [0.0]
We present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs)
Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with particular focus on Deep Multi-Task models.
arXiv Detail & Related papers (2023-08-23T16:42:27Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community.
Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.
Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate.
In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks.
For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z) - Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks.
We let the training process dictate the optimal weighting of tasks for every instance in the dataset.
We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z) - Learning to Branch for Multi-Task Learning [12.49373126819798]
We present an automated multi-task learning algorithm that learns where to share or branch within a network.
We propose a novel tree-structured design space that casts a tree branching operation as a gumbel-softmax sampling procedure.
arXiv Detail & Related papers (2020-06-02T19:23:21Z) - Multitask Learning with Single Gradient Step Update for Task Balancing [4.330814031477772]
We propose an algorithm to balance between tasks at the gradient level by applying gradient-based meta-learning to multitask learning.
We apply the proposed method to various multitask computer vision problems and achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-05-20T08:34:20Z) - Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks.
The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood.
We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.