Related papers: CD^2: Constrained Dataset Distillation for Few-Shot Class-Incremental Learning

CD^2: Constrained Dataset Distillation for Few-Shot Class-Incremental Learning

URL: http://arxiv.org/abs/2601.08519v1
Date: Tue, 13 Jan 2026 13:01:14 GMT
Title: CD^2: Constrained Dataset Distillation for Few-Shot Class-Incremental Learning
Authors: Kexin Bao, Daichi Zhang, Hansong Zhang, Yong Li, Yutao Yue, Shiming Ge,
Abstract summary: Few-shot class-incremental learning (FSCIL) receives significant attention from the public.<n>We propose a framework termed textbfConstrained textbfDataset textbfDistillation (textbfCD$2$) to facilitate FSCIL.
Score: 24.299542011394298
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Few-shot class-incremental learning (FSCIL) receives significant attention from the public to perform classification continuously with a few training samples, which suffers from the key catastrophic forgetting problem. Existing methods usually employ an external memory to store previous knowledge and treat it with incremental classes equally, which cannot properly preserve previous essential knowledge. To solve this problem and inspired by recent distillation works on knowledge transfer, we propose a framework termed \textbf{C}onstrained \textbf{D}ataset \textbf{D}istillation (\textbf{CD$^2$}) to facilitate FSCIL, which includes a dataset distillation module (\textbf{DDM}) and a distillation constraint module~(\textbf{DCM}). Specifically, the DDM synthesizes highly condensed samples guided by the classifier, forcing the model to learn compacted essential class-related clues from a few incremental samples. The DCM introduces a designed loss to constrain the previously learned class distribution, which can preserve distilled knowledge more sufficiently. Extensive experiments on three public datasets show the superiority of our method against other state-of-the-art competitors.

Related papers

Prompt Tuning for Few-Shot Continual Learning Named Entity Recognition [0.4662017507844857]
In Few-Shot CLNER (FS-CLNER) tasks, the scarcity of new-class entities makes it difficult for the trained model to generalize.<n>We address the above challenges through a prompt tuning paradigm and memory demonstration template strategy.
arXiv Detail & Related papers (2025-08-10T09:02:53Z)
Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We propose textitRestoration Score Distillation (RSD), a principled generalization of Denoising Score Distillation (DSD)<n>RSD accommodates a broader range of corruption types, such as blurred, incomplete, or low-resolution images.<n>It consistently surpasses its teacher model across diverse restoration tasks on both natural and scientific datasets.
arXiv Detail & Related papers (2025-05-19T17:21:03Z)
On Distilling the Displacement Knowledge for Few-Shot Class-Incremental Learning [17.819582979803286]
Few-shot Class-Incremental Learning (FSCIL) addresses the challenges of evolving data distributions and the difficulty of data acquisition in real-world scenarios.<n>To counteract the catastrophic forgetting typically encountered in FSCIL, knowledge distillation is employed as a way to maintain the knowledge from learned data distribution.
arXiv Detail & Related papers (2024-12-15T02:10:18Z)
Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation. At the sequence level, we propose a sequence correction and re-generation strategy. At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function. At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z)
Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD. SSD-KD balances synthetic samples and a priority sampling function to select proper samples. It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z)
f-Divergence Minimization for Sequence-Level Knowledge Distillation [23.513372304624486]
Knowledge distillation (KD) is the process of transferring knowledge from a large model to a small one. We propose an f-DISTILL framework, which formulates sequence-level knowledge distillation as minimizing a generalized f-divergence function. Experiments across four datasets show that our methods outperform existing KD approaches.
arXiv Detail & Related papers (2023-07-27T20:39:06Z)
Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly. We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z)
Uncertainty-Aware Distillation for Semi-Supervised Few-Shot Class-Incremental Learning [16.90277839119862]
We present a framework named Uncertainty-aware Distillation with Class-Equilibrium (UaD-CE) We introduce the CE module that employs a class-balanced self-training to avoid the gradual dominance of easy-to-classified classes on pseudo-label generation. Comprehensive experiments on three benchmark datasets demonstrate that our method can boost the adaptability of unlabeled data.
arXiv Detail & Related papers (2023-01-24T12:53:06Z)
Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt [52.6946016535059]
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data. We propose a prompt-based method, termed as PromptDFD, that allows us to take advantage of learned language priors. As shown in our experiments, the proposed method substantially improves the synthesis quality and achieves considerable improvements on distillation performance.
arXiv Detail & Related papers (2022-05-16T08:56:53Z)
Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning [73.24988226158497]
We consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL) We propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation. Our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks.
arXiv Detail & Related papers (2021-06-17T17:56:08Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A) In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them. Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.