Revisiting Continual Semantic Segmentation with Pre-trained Vision Models
- URL: http://arxiv.org/abs/2508.04267v1
- Date: Wed, 06 Aug 2025 09:51:46 GMT
- Title: Revisiting Continual Semantic Segmentation with Pre-trained Vision Models
- Authors: Duzhen Zhang, Yong Ren, Wei Cong, Junhao Zheng, Qiaoyi Su, Shuncheng Jia, Zhong-Zhi Li, Xuanle Zhao, Ye Bai, Feilong Chen, Qi Tian, Tielin Zhang,
- Abstract summary: Continual Semantic (CSS) seeks to incrementally learn to segment novel classes while preserving knowledge of previously encountered ones.<n>Recent advancements in CSS have been driven by the adoption of Pre-trained Vision Models (PVMs) as backbones.<n>Among existing strategies, Direct Fine-Tuning (DFT), which sequentially fine-tunes the model across classes, remains the most straightforward approach.
- Score: 53.56065605992639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Semantic Segmentation (CSS) seeks to incrementally learn to segment novel classes while preserving knowledge of previously encountered ones. Recent advancements in CSS have been largely driven by the adoption of Pre-trained Vision Models (PVMs) as backbones. Among existing strategies, Direct Fine-Tuning (DFT), which sequentially fine-tunes the model across classes, remains the most straightforward approach. Prior work often regards DFT as a performance lower bound due to its presumed vulnerability to severe catastrophic forgetting, leading to the development of numerous complex mitigation techniques. However, we contend that this prevailing assumption is flawed. In this paper, we systematically revisit forgetting in DFT across two standard benchmarks, Pascal VOC 2012 and ADE20K, under eight CSS settings using two representative PVM backbones: ResNet101 and Swin-B. Through a detailed probing analysis, our findings reveal that existing methods significantly underestimate the inherent anti-forgetting capabilities of PVMs. Even under DFT, PVMs retain previously learned knowledge with minimal forgetting. Further investigation of the feature space indicates that the observed forgetting primarily arises from the classifier's drift away from the PVM, rather than from degradation of the backbone representations. Based on this insight, we propose DFT*, a simple yet effective enhancement to DFT that incorporates strategies such as freezing the PVM backbone and previously learned classifiers, as well as pre-allocating future classifiers. Extensive experiments show that DFT* consistently achieves competitive or superior performance compared to sixteen state-of-the-art CSS methods, while requiring substantially fewer trainable parameters and less training time.
Related papers
- Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training [23.99424961055015]
This paper presents a comparative analysis of two core post-training paradigms: supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT)<n>Our experiments are conducted on a benchmark comprising seven diverse multimodal tasks.
arXiv Detail & Related papers (2025-07-07T18:17:06Z) - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning [97.49610356913874]
We propose a robust test-time prompt tuning (R-TPT) for vision-language models (VLMs)<n>R-TPT mitigates the impact of adversarial attacks during the inference stage.<n>We introduce a plug-and-play reliability-based weighted ensembling strategy to strengthen the defense.
arXiv Detail & Related papers (2025-04-15T13:49:31Z) - PTMs-TSCIL Pre-Trained Models Based Class-Incremental Learning [7.784244204592032]
Class-incremental learning (CIL) for time series data faces challenges in balancing stability against catastrophic forgetting and plasticity for new knowledge acquisition.<n>We present the first exploration of PTM-based Time Series Class-Incremental Learning (TSCIL)
arXiv Detail & Related papers (2025-03-10T10:27:21Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction [53.88231294380083]
We introduce a novel Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios.
MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data.
Our findings confirm that pre-trained layers can adapt to new embedding spaces, enhancing performance without overfitting.
arXiv Detail & Related papers (2024-06-27T04:00:15Z) - Majorization-Minimization for sparse SVMs [46.99165837639182]
Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework, several decades ago.
They often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena.
In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization.
arXiv Detail & Related papers (2023-08-31T17:03:16Z) - Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision.
Model compression detrimentally impacts the performance of visual prompting-based transfer.
However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.