Big-model Driven Few-shot Continual Learning
- URL: http://arxiv.org/abs/2309.00862v1
- Date: Sat, 2 Sep 2023 08:39:46 GMT
- Title: Big-model Driven Few-shot Continual Learning
- Authors: Ziqi Gu and Chunyan Xu and Zihan Lu and Xin Liu and Anbo Dai and Zhen
Cui
- Abstract summary: We propose a Big-model driven Few-shot Continual Learning (B-FSCL) framework to gradually evolve the model.
Experimental results of our proposed B-FSCL on three popular datasets completely surpass all state-of-the-art FSCL methods.
- Score: 24.392821997721295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot continual learning (FSCL) has attracted intensive attention and
achieved some advances in recent years, but now it is difficult to again make a
big stride in accuracy due to the limitation of only few-shot incremental
samples. Inspired by distinctive human cognition ability in life learning, in
this work, we propose a novel Big-model driven Few-shot Continual Learning
(B-FSCL) framework to gradually evolve the model under the traction of the
world's big-models (like human accumulative knowledge). Specifically, we
perform the big-model driven transfer learning to leverage the powerful
encoding capability of these existing big-models, which can adapt the continual
model to a few of newly added samples while avoiding the over-fitting problem.
Considering that the big-model and the continual model may have different
perceived results for the identical images, we introduce an instance-level
adaptive decision mechanism to provide the high-level flexibility cognitive
support adjusted to varying samples. In turn, the adaptive decision can be
further adopted to optimize the parameters of the continual model, performing
the adaptive distillation of big-model's knowledge information. Experimental
results of our proposed B-FSCL on three popular datasets (including CIFAR100,
minilmageNet and CUB200) completely surpass all state-of-the-art FSCL methods.
Related papers
- Modular Memory is the Key to Continual Learning Agents [100.09688599754465]
We argue that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale.<n>We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities.
arXiv Detail & Related papers (2026-03-02T11:40:05Z) - Partitioned Memory Storage Inspired Few-Shot Class-Incremental learning [2.9845592719739127]
Few-Shot Class-Incremental Learning (FSCIL) focuses on continuous learning of new categories with limited samples without forgetting old knowledge.
Our paper develops a method that learns independent models for each session. It can inherently prevent catastrophic forgetting.
Our method provides a fresh viewpoint for FSCIL and demonstrates the state-of-the-art performance on CIFAR-100 and mini-ImageNet datasets.
arXiv Detail & Related papers (2025-04-29T14:11:06Z) - Parameter-Efficient Continual Fine-Tuning: A Survey [5.59258786465086]
We believe the next breakthrough in AI lies in enabling efficient adaptation to evolving environments.
One alternative to efficiently adapt these large-scale models is known.
Efficient Fine-Tuning (PEFT)
arXiv Detail & Related papers (2025-04-18T17:51:51Z) - An experimental approach on Few Shot Class Incremental Learning [0.0]
Few-Shot Class-Incremental Learning (FSCIL) represents a cutting-edge paradigm within the broader scope of machine learning.
The paper will present different solutions which contain extensive experiments across large-scale datasets.
We highlight their advantages and then present an experimental approach with the purpose of improving the most promising one.
arXiv Detail & Related papers (2025-03-14T12:36:15Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Mixtures of Experts Unlock Parameter Scaling for Deep RL [54.26191237981469]
In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules into value-based networks results in more parameter-scalable models.
This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
arXiv Detail & Related papers (2024-02-13T17:18:56Z) - Enhanced Few-Shot Class-Incremental Learning via Ensemble Models [34.84881941101568]
Few-shot class-incremental learning aims to continually fit new classes with limited training data.
The main challenges are overfitting the rare new training samples and forgetting old classes.
We propose a new ensemble model framework cooperating with data augmentation to boost generalization.
arXiv Detail & Related papers (2024-01-14T06:07:07Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.