Elastic Multi-Gradient Descent for Parallel Continual Learning
- URL: http://arxiv.org/abs/2401.01054v1
- Date: Tue, 2 Jan 2024 06:26:25 GMT
- Title: Elastic Multi-Gradient Descent for Parallel Continual Learning
- Authors: Fan Lyu, Wei Feng, Yuepan Li, Qing Sun, Fanhua Shang, Liang Wan, Liang
Wang
- Abstract summary: We study the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios.
PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress.
We propose a memory editing mechanism guided by the gradient computed using EMGD to balance the training between old and new tasks.
- Score: 28.749215705746135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of Continual Learning (CL) is to continuously learn from new data
streams and accomplish the corresponding tasks. Previously studied CL assumes
that data are given in sequence nose-to-tail for different tasks, thus indeed
belonging to Serial Continual Learning (SCL). This paper studies the novel
paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios,
where a diverse set of tasks is encountered at different time points. PCL
presents challenges due to the training of an unspecified number of tasks with
varying learning progress, leading to the difficulty of guaranteeing effective
model updates for all encountered tasks. In our previous conference work, we
focused on measuring and reducing the discrepancy among gradients in a
multi-objective optimization problem, which, however, may still contain
negative transfers in every model update. To address this issue, in the dynamic
multi-objective optimization problem, we introduce task-specific elastic
factors to adjust the descent direction towards the Pareto front. The proposed
method, called Elastic Multi-Gradient Descent (EMGD), ensures that each update
follows an appropriate Pareto descent direction, minimizing any negative impact
on previously learned tasks. To balance the training between old and new tasks,
we also propose a memory editing mechanism guided by the gradient computed
using EMGD. This editing process updates the stored data points, reducing
interference in the Pareto descent direction from previous tasks. Experiments
on public datasets validate the effectiveness of our EMGD in the PCL setting.
Related papers
- LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - Task Addition in Multi-Task Learning by Geometrical Alignment [4.220885199861056]
We propose a task addition approach for GATE to improve performance on target tasks with limited data.
It is achieved through supervised multi-task pre-training on a large dataset, followed by the addition and training of task-specific modules for each target task.
Our experiments demonstrate the superior performance of the task addition strategy for GATE over conventional multi-task methods, with comparable computational costs.
arXiv Detail & Related papers (2024-09-25T05:56:00Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Scalable Weight Reparametrization for Efficient Transfer Learning [10.265713480189486]
Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
arXiv Detail & Related papers (2023-02-26T23:19:11Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Online Continual Learning via the Meta-learning Update with Multi-scale
Knowledge Distillation and Data Augmentation [4.109784267309124]
Continual learning aims to rapidly and continually learn the current task from a sequence of tasks.
One common limitation of this method is the data imbalance between the previous and current tasks.
We propose a novel framework called Meta-learning update via Multi-scale Knowledge Distillation and Data Augmentation.
arXiv Detail & Related papers (2022-09-12T10:03:53Z) - Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of
Gaussian Processes [25.513074215377696]
This paper proposes a continual online model-based reinforcement learning approach.
It does not require pre-training to solve task-agnostic problems with unknown task boundaries.
In experiments, our approach outperforms alternative methods in non-stationary tasks.
arXiv Detail & Related papers (2020-06-19T23:52:45Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.