Gradient Coordination for Quantifying and Maximizing Knowledge
Transference in Multi-Task Learning
- URL: http://arxiv.org/abs/2303.05847v1
- Date: Fri, 10 Mar 2023 10:42:21 GMT
- Title: Gradient Coordination for Quantifying and Maximizing Knowledge
Transference in Multi-Task Learning
- Authors: Xuanhua Yang, Jianxin Zhao, Shaoguo Liu, Liang Wang and Bo Zheng
- Abstract summary: Multi-task learning (MTL) has been widely applied in online advertising and recommender systems.
We propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference.
- Score: 11.998475119120531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning (MTL) has been widely applied in online advertising and
recommender systems. To address the negative transfer issue, recent studies
have proposed optimization methods that thoroughly focus on the gradient
alignment of directions or magnitudes. However, since prior study has proven
that both general and specific knowledge exist in the limited shared capacity,
overemphasizing on gradient alignment may crowd out task-specific knowledge,
and vice versa. In this paper, we propose a transference-driven approach CoGrad
that adaptively maximizes knowledge transference via Coordinated Gradient
modification. We explicitly quantify the transference as loss reduction from
one task to another, and then derive an auxiliary gradient from optimizing it.
We perform the optimization by incorporating this gradient into original task
gradients, making the model automatically maximize inter-task transfer and
minimize individual losses. Thus, CoGrad can harmonize between general and
specific knowledge to boost overall performance. Besides, we introduce an
efficient approximation of the Hessian matrix, making CoGrad computationally
efficient and simple to implement. Both offline and online experiments verify
that CoGrad significantly outperforms previous methods.
Related papers
- Fair Resource Allocation in Multi-Task Learning [12.776767874217663]
Multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance.
A major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks.
Inspired by fair resource allocation in communication networks, we propose FairGrad, a novel MTL optimization method.
arXiv Detail & Related papers (2024-02-23T22:46:14Z) - Adapting Step-size: A Unified Perspective to Analyze and Improve
Gradient-based Methods for Adversarial Attacks [21.16546620434816]
We provide a unified theoretical interpretation of gradient-based adversarial learning methods.
We show that each of these algorithms is in fact a specific reformulation of their original gradient methods.
We present a broad class of adaptive gradient-based algorithms based on the regular gradient methods.
arXiv Detail & Related papers (2023-01-27T06:17:51Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Efficient Differentiable Simulation of Articulated Bodies [89.64118042429287]
We present a method for efficient differentiable simulation of articulated bodies.
This enables integration of articulated body dynamics into deep learning frameworks.
We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method.
arXiv Detail & Related papers (2021-09-16T04:48:13Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Layerwise Optimization by Gradient Decomposition for Continual Learning [78.58714373218118]
Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains.
When learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting"
arXiv Detail & Related papers (2021-05-17T01:15:57Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.