Restricted Orthogonal Gradient Projection for Continual Learning
- URL: http://arxiv.org/abs/2301.12131v1
- Date: Sat, 28 Jan 2023 08:50:48 GMT
- Title: Restricted Orthogonal Gradient Projection for Continual Learning
- Authors: Zeyuan Yang, Zonghan Yang, Peng Li, Yang Liu
- Abstract summary: gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference.
Recent methods reuse frozen parameters with a growing network, resulting in high computational costs.
We propose the Restricted Orthogonal Gradient prOjection (ROGO) framework.
- Score: 17.89324741805405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning aims to avoid catastrophic forgetting and effectively
leverage learned experiences to master new knowledge. Existing gradient
projection approaches impose hard constraints on the optimization space for new
tasks to minimize interference, which simultaneously hinders forward knowledge
transfer. To address this issue, recent methods reuse frozen parameters with a
growing network, resulting in high computational costs. Thus, it remains a
challenge whether we can improve forward knowledge transfer for gradient
projection approaches using a fixed network architecture. In this work, we
propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The
basic idea is to adopt a restricted orthogonal constraint allowing parameters
optimized in the direction oblique to the whole frozen space to facilitate
forward knowledge transfer while consolidating previous knowledge. Our
framework requires neither data buffers nor extra parameters. Extensive
experiments have demonstrated the superiority of our framework over several
strong baselines. We also provide theoretical guarantees for our relaxing
strategy.
Related papers
- Gradient-free Continual Learning [0.0]
Continual learning (CL) presents a fundamental challenge in training neural networks on sequential tasks without experiencing catastrophic forgetting.
Traditionally, the dominant approach in CL has been gradient-based optimization, where updates to the network parameters are performed using gradient descent (SGD) or its variants.
In such cases, there is no gradient information available for past data, leading to uncontrolled parameter changes and consequently severe forgetting of previously learned tasks.
We explore the hypothesis that gradient-free optimization methods can provide a robust alternative to conventional gradient-based continual learning approaches.
arXiv Detail & Related papers (2025-04-01T22:18:59Z) - CODE-CL: COnceptor-Based Gradient Projection for DEep Continual Learning [7.573297026523597]
We introduce COnceptor-based gradient projection for DEep Continual Learning (CODE-CL)
CODE-CL encodes directional importance within the input space of past tasks, allowing new knowledge integration in directions modulated by $1-S$.
We analyze task overlap using conceptor-based representations to identify highly correlated tasks.
arXiv Detail & Related papers (2024-11-21T22:31:06Z) - Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting [41.891312602770746]
Gradient Episodic Memory (GEM) achieves balance by utilizing a subset of past training samples to restrict the update direction of the model parameters.
We show that memory strength is effective mainly because it improves GEM's ability generalization and therefore leads to a more favorable trade-off.
arXiv Detail & Related papers (2024-10-01T17:03:56Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search [45.57494859267399]
Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients.
A majority of gradient inversion methods rely heavily on explicit prior knowledge, which is often unavailable in realistic scenarios.
We propose Neural Architecture Search (GI-NAS), which adaptively searches the network and captures the implicit priors behind neural architectures.
arXiv Detail & Related papers (2024-05-31T09:29:43Z) - Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class
Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past.
We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Natural continual learning: success is a journey, not (just) a
destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent.
Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs.
The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.