Related papers: Alleviating Representational Shift for Continual Fine-tuning

Alleviating Representational Shift for Continual Fine-tuning

URL: http://arxiv.org/abs/2204.10535v1
Date: Fri, 22 Apr 2022 06:58:20 GMT
Title: Alleviating Representational Shift for Continual Fine-tuning
Authors: Shibo Jie, Zhi-Hong Deng, Ziheng Li
Abstract summary: We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. We propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS.
Score: 13.335957004592407
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to fine-tune the pre-trained network, preventing massive changes in Conv layers and thus alleviating IRS. Experimental results on four datasets show that our method remarkably outperforms several state-of-the-art methods with lower storage overhead.

Related papers

Revisiting Continual Semantic Segmentation with Pre-trained Vision Models [53.56065605992639]
Continual Semantic (CSS) seeks to incrementally learn to segment novel classes while preserving knowledge of previously encountered ones.<n>Recent advancements in CSS have been driven by the adoption of Pre-trained Vision Models (PVMs) as backbones.<n>Among existing strategies, Direct Fine-Tuning (DFT), which sequentially fine-tunes the model across classes, remains the most straightforward approach.
arXiv Detail & Related papers (2025-08-06T09:51:46Z)
ADT: Tuning Diffusion Models with Adversarial Supervision [16.974169058917443]
Diffusion models have achieved outstanding image generation by reversing a forward noising process to approximate true data distributions. We propose Adrial Diffusion Tuning (ADT) to stimulate the inference process during optimization and align the final outputs with training data. ADT features a siamese-network discriminator with a fixed pre-trained backbone and lightweight trainable parameters.
arXiv Detail & Related papers (2025-04-15T17:37:50Z)
Semantic Shift Estimation via Dual-Projection and Classifier Reconstruction for Exemplar-Free Class-Incremental Learning [20.581215770655383]
We propose a Dual-Projection Shift Estimation and Incremental Reconstruction (DPCR) approach for Exemplar-Free Class-Learning (EFCIL) DPCR effectively estimates semantic shift through a dual-projection, which combines a row-space projection to capture both taskwise and categorywise shifts. We demonstrate that, across various datasets, DPCR effectively balances old and new tasks, outperforming state-the-art EFCIL methods.
arXiv Detail & Related papers (2025-03-07T13:50:29Z)
Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility [16.998477658358773]
We consider classification tasks and characterize the data distribution as a low-dimensional manifold. We argue that clean training experiences poor convergence in the off-manifold direction caused by the ill-conditioning. We perform experiments and exhibit tremendous robustness improvements in clean training through long training and the employment of second-order methods.
arXiv Detail & Related papers (2024-10-09T14:18:52Z)
DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration [38.4461170690033]
We propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune) DR-Tune employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution. To alleviate the interference by semantic drift, we develop the semantic calibration (SC) module.
arXiv Detail & Related papers (2023-08-23T10:59:20Z)
Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties. We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z)
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts [114.17184775397067]
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model. This paper shows that in such settings, selectively fine-tuning a subset of layers matches or outperforms commonly used fine-tuning approaches.
arXiv Detail & Related papers (2022-10-20T17:59:15Z)
Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues. The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process. We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z)
Conditional Variational Autoencoder with Balanced Pre-training for Generative Adversarial Networks [11.46883762268061]
Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly. With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples. We propose a novel Variational Autoencoder with Balanced Pre-training for Geneversarative Adrial Networks (CAPGAN) as an augmentation tool to generate realistic synthetic images.
arXiv Detail & Related papers (2022-01-13T06:52:58Z)
Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations. Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z)
SuperDeConFuse: A Supervised Deep Convolutional Transform based Fusion Framework for Financial Trading Systems [29.411173536818477]
This work proposes a supervised multi-channel time-series learning framework for financial stock trading. Our approach consists of processing the data channels through separate 1-D convolution layers, then fusing the outputs with a series of fully-connected layers, and finally applying a softmax classification layer. Numerical experiments confirm that the proposed model yields considerably better results than state-of-the-art deep learning techniques for real-world problem of stock trading.
arXiv Detail & Related papers (2020-11-09T11:58:12Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.