BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning
- URL: http://arxiv.org/abs/2511.11421v1
- Date: Fri, 14 Nov 2025 15:51:40 GMT
- Title: BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning
- Authors: Lan Li, Tao Hu, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan,
- Abstract summary: Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge.<n>Applying vision-language models such as CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities.
- Score: 84.56022893225422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable representations via multi-modal supervision, making them promising for CIL. However, applying CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities. To address these issues, we propose BOFA (Bridge-layer Orthogonal Fusion for Adaptation), a novel framework for CIL. BOFA confines all model adaptation exclusively to CLIP's existing cross-modal bridge-layer, thereby adding no extra parameters or inference cost. To prevent forgetting within this layer, it leverages Orthogonal Low-Rank Fusion, a mechanism that constrains parameter updates to a low-rank ``safe subspace" mathematically constructed to be orthogonal to past task features. This ensures stable knowledge accumulation without data replay. Furthermore, BOFA employs a cross-modal hybrid prototype that synergizes stable textual prototypes with visual counterparts derived from our stably adapted bridge-layer, enhancing classification performance. Extensive experiments on standard benchmarks show that BOFA achieves superior accuracy and efficiency compared to existing methods.
Related papers
- Unlocking Prototype Potential: An Efficient Tuning Framework for Few-Shot Class-Incremental Learning [69.28860905525057]
Few-shot class-incremental learning (FSCIL) seeks to continuously learn new classes from very limited samples.<n>We introduce an efficient prototype fine-tuning framework that evolves static centroids into dynamic, learnable components.
arXiv Detail & Related papers (2026-02-05T03:50:53Z) - Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning [11.752632557524969]
Causal CLIP Adapter (CCA) is a novel framework that explicitly disentangles visual features extracted from CLIP.<n>Our method consistently outperforms state-of-the-art approaches in terms of few-shot performance and robustness to distributional shifts.
arXiv Detail & Related papers (2025-08-05T05:30:42Z) - CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning [8.81873424028249]
Class-Incremental Learning (CIL) aims to learn new classes sequentially while retaining the knowledge of previously learned classes.<n>We propose a novel dual-adapter architecture combining textbftask-shared adapters to learn cross-task knowledge and textbftask-specific adapters to capture unique features of each new task.<n>We demonstrate CL-LoRA consistently achieves promising performance under multiple benchmarks with reduced training and inference computation.
arXiv Detail & Related papers (2025-05-30T17:19:52Z) - Low-Complexity Inference in Continual Learning via Compressed Knowledge Transfer [5.079602839359523]
Continual learning (CL) aims to train models that can learn a sequence of tasks without forgetting previously acquired knowledge.<n>Recently, large pre-trained models have been widely adopted in CL for their ability to support both.<n>We propose two efficient frameworks tailored for class-incremental learning.
arXiv Detail & Related papers (2025-05-13T08:07:40Z) - Continuous Knowledge-Preserving Decomposition with Adaptive Layer Selection for Few-Shot Class-Incremental Learning [73.59672160329296]
CKPD-FSCIL is a unified framework that unlocks the underutilized capacity of pretrained weights.<n>Our method consistently outperforms state-of-the-art approaches in both adaptability and knowledge retention.
arXiv Detail & Related papers (2025-01-09T07:18:48Z) - Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models [15.847302755988506]
We address the Continual Learning problem, wherein a model must learn a sequence of tasks from non-stationary distributions.
We propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network.
Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
arXiv Detail & Related papers (2023-12-13T13:11:44Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.