One Step Learning, One Step Review
- URL: http://arxiv.org/abs/2401.10962v1
- Date: Fri, 19 Jan 2024 11:45:31 GMT
- Title: One Step Learning, One Step Review
- Authors: Xiaolong Huang, Qiankun Li, Xueran Li, Xuesong Gao
- Abstract summary: We propose a novel weight rollback-based fine-tuning method called OLOR (One step Learning, One step Review)
In this paper, we propose a novel weight rollback-based fine-tuning method called OLOR (One step Learning, One step Review)
- Score: 6.540346282603399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual fine-tuning has garnered significant attention with the rise of
pre-trained vision models. The current prevailing method, full fine-tuning,
suffers from the issue of knowledge forgetting as it focuses solely on fitting
the downstream training set. In this paper, we propose a novel weight
rollback-based fine-tuning method called OLOR (One step Learning, One step
Review). OLOR combines fine-tuning with optimizers, incorporating a weight
rollback term into the weight update term at each step. This ensures
consistency in the weight range of upstream and downstream models, effectively
mitigating knowledge forgetting and enhancing fine-tuning performance. In
addition, a layer-wise penalty is presented to employ penalty decay and the
diversified decay rate to adjust the weight rollback levels of layers for
adapting varying downstream tasks. Through extensive experiments on various
tasks such as image classification, object detection, semantic segmentation,
and instance segmentation, we demonstrate the general applicability and
state-of-the-art performance of our proposed OLOR. Code is available at
https://github.com/rainbow-xiao/OLOR-AAAI-2024.
Related papers
- TrAct: Making First-layer Pre-Activations Trainable [65.40281259525578]
We consider the training of the first layer of vision models and notice the clear relationship between pixel values and update magnitudes.
An image with low contrast has a smaller impact on learning than an image with higher contrast.
A very bright or very dark image has a stronger impact on the weights than an image with moderate brightness.
arXiv Detail & Related papers (2024-10-31T14:25:55Z) - AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning [9.51289606759621]
Training and fine-tuning large language models (LLMs) come with challenges related to memory and computational requirements.
Various techniques have been developed to tackle these challenges, such as low-rank adaptation (LoRA)
We introduce a new method inspired by a phenomenon we formally prove: as training progresses, the rank of the estimated gradient gradually decreases.
arXiv Detail & Related papers (2024-10-23T13:53:26Z) - Gradient Projection For Continual Parameter-Efficient Tuning [42.800411328615894]
We reformulate Adapter, LoRA, Prefix-tuning, and Prompt-tuning from the perspective of gradient projection.
We show that the condition for the gradient can effectively resist forgetting even for large-scale models.
We extensively evaluate our method with different backbones, including ViT and CLIP, on diverse datasets.
arXiv Detail & Related papers (2024-05-22T06:33:48Z) - CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion [0.0]
This paper introduces a new training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP)
CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation.
When compared with Elastic Weight Consolidation (EWC) datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
arXiv Detail & Related papers (2024-04-29T13:31:00Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - DR-Tune: Improving Fine-tuning of Pretrained Visual Models by
Distribution Regularization with Semantic Calibration [38.4461170690033]
We propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune)
DR-Tune employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution.
To alleviate the interference by semantic drift, we develop the semantic calibration (SC) module.
arXiv Detail & Related papers (2023-08-23T10:59:20Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z) - Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot
Learning [76.98364915566292]
A common practice is to train a model on the base set first and then transfer to novel classes through fine-tuning.
We propose to transfer partial knowledge by freezing or fine-tuning particular layer(s) in the base model.
We conduct extensive experiments on CUB and mini-ImageNet to demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-08T03:27:05Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.