Alleviating Representational Shift for Continual Fine-tuning
- URL: http://arxiv.org/abs/2204.10535v1
- Date: Fri, 22 Apr 2022 06:58:20 GMT
- Title: Alleviating Representational Shift for Continual Fine-tuning
- Authors: Shibo Jie, Zhi-Hong Deng, Ziheng Li
- Abstract summary: We study a practical setting of continual learning: fine-tuning on a pre-trained model continually.
We propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning.
Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS.
- Score: 13.335957004592407
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study a practical setting of continual learning: fine-tuning on a
pre-trained model continually. Previous work has found that, when training on
new tasks, the features (penultimate layer representations) of previous data
will change, called representational shift. Besides the shift of features, we
reveal that the intermediate layers' representational shift (IRS) also matters
since it disrupts batch normalization, which is another crucial cause of
catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning
method incorporating two components, cross-convolution batch normalization
(Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution
running means instead of post-convolution, and recovers post-convolution ones
before testing, which corrects the inaccurate estimates of means under IRS.
Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the
pre-trained network, preventing massive changes in Conv layers and thus
alleviating IRS. Experimental results on four datasets show that our method
remarkably outperforms several state-of-the-art methods with lower storage
overhead.
Related papers
- Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility [16.998477658358773]
We consider classification tasks and characterize the data distribution as a low-dimensional manifold.
We argue that clean training experiences poor convergence in the off-manifold direction caused by the ill-conditioning.
We perform experiments and exhibit tremendous robustness improvements in clean training through long training and the employment of second-order methods.
arXiv Detail & Related papers (2024-10-09T14:18:52Z) - DR-Tune: Improving Fine-tuning of Pretrained Visual Models by
Distribution Regularization with Semantic Calibration [38.4461170690033]
We propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune)
DR-Tune employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution.
To alleviate the interference by semantic drift, we develop the semantic calibration (SC) module.
arXiv Detail & Related papers (2023-08-23T10:59:20Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Surgical Fine-Tuning Improves Adaptation to Distribution Shifts [114.17184775397067]
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model.
This paper shows that in such settings, selectively fine-tuning a subset of layers matches or outperforms commonly used fine-tuning approaches.
arXiv Detail & Related papers (2022-10-20T17:59:15Z) - Counterfactual Intervention Feature Transfer for Visible-Infrared Person
Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues.
The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process.
We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z) - Conditional Variational Autoencoder with Balanced Pre-training for
Generative Adversarial Networks [11.46883762268061]
Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly.
With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples.
We propose a novel Variational Autoencoder with Balanced Pre-training for Geneversarative Adrial Networks (CAPGAN) as an augmentation tool to generate realistic synthetic images.
arXiv Detail & Related papers (2022-01-13T06:52:58Z) - Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z) - SuperDeConFuse: A Supervised Deep Convolutional Transform based Fusion
Framework for Financial Trading Systems [29.411173536818477]
This work proposes a supervised multi-channel time-series learning framework for financial stock trading.
Our approach consists of processing the data channels through separate 1-D convolution layers, then fusing the outputs with a series of fully-connected layers, and finally applying a softmax classification layer.
Numerical experiments confirm that the proposed model yields considerably better results than state-of-the-art deep learning techniques for real-world problem of stock trading.
arXiv Detail & Related papers (2020-11-09T11:58:12Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.