New Tight Relaxations of Rank Minimization for Multi-Task Learning
- URL: http://arxiv.org/abs/2112.04734v1
- Date: Thu, 9 Dec 2021 07:29:57 GMT
- Title: New Tight Relaxations of Rank Minimization for Multi-Task Learning
- Authors: Wei Chang, Feiping Nie, Rong Wang, Xuelong Li
- Abstract summary: We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
- Score: 161.23314844751556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-task learning has been observed by many researchers, which supposes
that different tasks can share a low-rank common yet latent subspace. It means
learning multiple tasks jointly is better than learning them independently. In
this paper, we propose two novel multi-task learning formulations based on two
regularization terms, which can learn the optimal shared latent subspace by
minimizing the exactly $k$ minimal singular values. The proposed regularization
terms are the more tight approximations of rank minimization than trace norm.
But it's an NP-hard problem to solve the exact rank minimization problem.
Therefore, we design a novel re-weighted based iterative strategy to solve our
models, which can tactically handle the exact rank minimization problem by
setting a large penalizing parameter. Experimental results on benchmark
datasets demonstrate that our methods can correctly recover the low-rank
structure shared across tasks, and outperform related multi-task learning
methods.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Task Difficulty Aware Parameter Allocation & Regularization for Lifelong
Learning [20.177260510548535]
We propose the Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty.
Our method is scalable and significantly reduces the model's redundancy while improving the model's performance.
arXiv Detail & Related papers (2023-04-11T15:38:21Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - In Defense of the Unitary Scalarization for Deep Multi-Task Learning [121.76421174107463]
We present a theoretical analysis suggesting that many specialized multi-tasks can be interpreted as forms of regularization.
We show that, when coupled with standard regularization and stabilization techniques, unitary scalarization matches or improves upon the performance of complex multitasks.
arXiv Detail & Related papers (2022-01-11T18:44:17Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Sample Efficient Linear Meta-Learning by Alternating Minimization [74.40553081646995]
We study a simple alternating minimization method (MLLAM) which alternately learns the low-dimensional subspace and the regressors.
We show that for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $Omega(log d)$ samples per task.
We propose a novel task subset selection scheme that ensures the same strong statistical guarantee as MLLAM.
arXiv Detail & Related papers (2021-05-18T06:46:48Z) - The Sample Complexity of Meta Sparse Regression [38.092179552223364]
This paper addresses the meta-learning problem in sparse linear regression with infinite tasks.
We show that T in O (( k log(p) ) /l ) tasks are sufficient in order to recover the common support of all tasks.
arXiv Detail & Related papers (2020-02-22T00:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.