Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models
- URL: http://arxiv.org/abs/2405.16625v2
- Date: Tue, 01 Apr 2025 19:28:44 GMT
- Title: Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models
- Authors: Shuvendu Roy, Elham Dolatabadi, Arash Afkanpour, Ali Etemad,
- Abstract summary: CoACT consists of three key components: (i) asynchronous contrastive tuning, (ii) controlled fine-tuning, and (iii) consistency-guided incremental tuning.<n>We evaluate our proposed solution on Few-Shot Class-Incremental Learning (FSCIL) and a new and more challenging setup called Few-Shot Class-Incremental Tuning (FSCIT)<n>CoACT outperforms existing methods by up to 5.02% in FSCIL and up to 12.51% in FSCIT for individual datasets, with an average improvement of 2.47%.
- Score: 19.165004570789755
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose Consistency-guided Asynchronous Contrastive Tuning (CoACT), a novel method for continuously tuning foundation models to learn new classes in few-shot settings. CoACT consists of three key components:(i) asynchronous contrastive tuning, which learns new classes by including LoRA modules in the pre-trained encoder while enforcing consistency between two asynchronous encoders; (ii) controlled fine-tuning, which facilitates effective tuning of a subset of the foundation model; and (iii) consistency-guided incremental tuning, which enforces additional regularization during later sessions to reduce forgetting of the learned classes. We evaluate our proposed solution on Few-Shot Class-Incremental Learning (FSCIL) as well as a new and more challenging setup called Few-Shot Class-Incremental Tuning (FSCIT), which facilitates the continual tuning of vision foundation models to learn new classes with only a few samples per class. Unlike traditional FSCIL, FSCIT does not require a large in-distribution base session for initial fully supervised training prior to the incremental few-shot sessions. We conduct extensive evaluations across 16 diverse datasets, demonstrating the effectiveness of CoACT in both FSCIL and FSCIT setups. CoACT outperforms existing methods by up to 5.02% in FSCIL and up to 12.51% in FSCIT for individual datasets, with an average improvement of 2.47%. Furthermore, CoACT exhibits reduced forgetting and enhanced robustness in low-shot experiments. Detailed ablation and sensitivity studies highlight the contribution of each component of CoACT. We make our code publicly available at https://github.com/ShuvenduRoy/CoACT-FSCIL.
Related papers
- Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model [71.45491434257106]
Unified Generative Recommendation Framework (UniGRF) is a novel approach that integrates retrieval and ranking into a single generative model.
To enhance inter-stage collaboration, UniGRF introduces a ranking-driven enhancer module.
UniGRF significantly outperforms existing models on benchmark datasets.
arXiv Detail & Related papers (2025-04-23T06:43:54Z) - Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need [83.10178754323955]
Few-shot learning enables models to generalize from only a few labeled examples.
We propose the Unbiased Max-Min Embedding Classification (UMMEC) Method, which addresses the key challenges in few-shot learning.
Our method significantly improves classification performance with minimal labeled data, advancing the state-of-the-art in annotatedL.
arXiv Detail & Related papers (2025-03-28T07:23:07Z) - Singular Value Fine-tuning for Few-Shot Class-Incremental Learning [38.777602828340356]
Class-Incremental Learning (CIL) aims to prevent catastrophic forgetting of previously learned classes while incorporating new ones.
We propose the Singular Value Finetuning for FSCIL (SVFCL)
SVFCL applies singular value decomposition to the foundation model weights, keeping the singular vectors fixed while fine-tuning the singular values for each task, and then merging them.
arXiv Detail & Related papers (2025-03-13T09:57:28Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection [2.0755366440393743]
Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD)
We introduce a novel Submodular Mutual Information Learning framework which adopts mutual information functions.
Our proposed approach generalizes to several existing approaches in FSOD, agnostic of the backbone architecture.
arXiv Detail & Related papers (2024-07-02T20:53:43Z) - Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models [0.0]
This work proposes a 3 Phase technique to adjust a base model for a classification task.
We adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE)
In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets.
arXiv Detail & Related papers (2024-05-23T11:08:35Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models [15.847302755988506]
We address the Continual Learning problem, wherein a model must learn a sequence of tasks from non-stationary distributions.
We propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network.
Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
arXiv Detail & Related papers (2023-12-13T13:11:44Z) - Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data.
Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks.
However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z) - Knowledge Transfer-Driven Few-Shot Class-Incremental Learning [23.163459923345556]
Few-shot class-incremental learning (FSCIL) aims to continually learn new classes using a few samples while not forgetting the old classes.
Despite the advance of existing FSCIL methods, the proposed knowledge transfer learning schemes are sub-optimal due to the insufficient optimization for the model's plasticity.
We propose a Random Episode Sampling and Augmentation (RESA) strategy that relies on diverse pseudo incremental tasks as agents to achieve the knowledge transfer.
arXiv Detail & Related papers (2023-06-19T14:02:45Z) - Consistency-guided Prompt Learning for Vision-Language Models [23.4909421082857]
We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models.
Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting.
arXiv Detail & Related papers (2023-06-01T23:20:47Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Modeling Inter-Class and Intra-Class Constraints in Novel Class
Discovery [20.67503042774617]
Novel class discovery (NCD) aims at learning a model that transfers the common knowledge from a class-disjoint labelled dataset to another unlabelled dataset.
We propose to model both inter-class and intra-class constraints in NCD based on the symmetric Kullback-Leibler divergence (sKLD)
arXiv Detail & Related papers (2022-10-07T14:46:32Z) - Rethinking Few-Shot Class-Incremental Learning with Open-Set Hypothesis
in Hyperbolic Geometry [21.38183613466714]
Few-Shot Class-Incremental Learning (FSCIL) aims at incrementally learning novel classes from a few labeled samples.
In this paper, we rethink the configuration of FSCIL with the open-set hypothesis by reserving the possibility in the first session for incoming categories.
To assign better performances on both close-set and open-set recognition to the model, Hyperbolic Reciprocal Point Learning module (Hyper-RPL) is built on Reciprocal Point Learning (RPL) with hyperbolic neural networks.
arXiv Detail & Related papers (2022-07-20T15:13:48Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Contrastive Prototype Learning with Augmented Embeddings for Few-Shot
Learning [58.2091760793799]
We propose a novel contrastive prototype learning with augmented embeddings (CPLAE) model.
With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away.
Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.
arXiv Detail & Related papers (2021-01-23T13:22:44Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.