TBGC: Task-level Backbone-Oriented Gradient Clip for Multi-Task
Foundation Model Learning
- URL: http://arxiv.org/abs/2307.03465v1
- Date: Fri, 7 Jul 2023 08:57:57 GMT
- Title: TBGC: Task-level Backbone-Oriented Gradient Clip for Multi-Task
Foundation Model Learning
- Authors: Zelun Zhang, Xue Pan
- Abstract summary: We propose the task-level backbone-oriented gradient clip paradigm, compared with the vanilla gradient clip method.
Based on the experimental results, we argue that the task-level backbone-oriented gradient clip paradigm can relieve the gradient bias problem to some extent.
Our approach has been shown to be effective and finally achieve 1st place in the Leaderboard A and 2nd place in the Leaderboard B of the CVPR2023 Foundation Model Challenge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The AllInOne training paradigm squeezes a wide range of tasks into a unified
model in a multi-task learning manner. However, optimization in multi-task
learning is more challenge than single-task learning, as the gradient norm from
different tasks may vary greatly, making the backbone overly biased towards one
specific task. To address this issue, we propose the task-level
backbone-oriented gradient clip paradigm, compared with the vanilla gradient
clip method, it has two points of emphasis:1) gradient clip is performed
independently for each task. 2) backbone gradients generated from each task are
rescaled to the same norm scale. Based on the experimental results, we argue
that the task-level backbone-oriented gradient clip paradigm can relieve the
gradient bias problem to some extent. We also propose a novel multi-branch data
augmentation strategy where conflict augmentations are placed in different
branches. Our approach has been shown to be effective and finally achieve 1st
place in the Leaderboard A and 2nd place in the Leaderboard B of the CVPR2023
Foundation Model Challenge. It's worth noting that instead of evaluating all
three tasks(detection, segmentation and fine-grained classification) in
Leaderboard A, the segmentation task is not evaluated in Leaderboard B, in
which our team has a huge advantage.
Related papers
- Two-Stage Multi-task Self-Supervised Learning for Medical Image
Segmentation [1.5863809575305416]
Medical image segmentation has been significantly advanced by deep learning (DL) techniques.
The data scarcity inherent in medical applications poses a great challenge to DL-based segmentation methods.
arXiv Detail & Related papers (2024-02-11T07:49:35Z) - FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level
Gradient Calibration [89.4165092674947]
Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario.
Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima.
We propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization.
arXiv Detail & Related papers (2023-07-31T12:50:15Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community.
Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.
Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate.
In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks.
For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Multitask Learning with Single Gradient Step Update for Task Balancing [4.330814031477772]
We propose an algorithm to balance between tasks at the gradient level by applying gradient-based meta-learning to multitask learning.
We apply the proposed method to various multitask computer vision problems and achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-05-20T08:34:20Z) - GradMix: Multi-source Transfer across Domains and Tasks [33.98368732653684]
GradMix is a model-agnostic method applicable to any model trained with gradient-based learning rule.
We conduct MS-DTT experiments on two tasks: digit recognition and action recognition.
arXiv Detail & Related papers (2020-02-09T02:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.