D3Former: Debiased Dual Distilled Transformer for Incremental Learning
- URL: http://arxiv.org/abs/2208.00777v3
- Date: Sat, 3 Jun 2023 11:48:54 GMT
- Title: D3Former: Debiased Dual Distilled Transformer for Incremental Learning
- Authors: Abdelrahman Mohamed, Rushali Grandhe, K J Joseph, Salman Khan, Fahad
Khan
- Abstract summary: In class incremental learning (CIL) setting, groups of classes are introduced to a model in each learning phase.
The goal is to learn a unified model performant on all the classes observed so far.
We develop a Debiased Dual Distilled Transformer for CIL dubbed $textrmD3textrmFormer$.
- Score: 25.65032941918354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In class incremental learning (CIL) setting, groups of classes are introduced
to a model in each learning phase. The goal is to learn a unified model
performant on all the classes observed so far. Given the recent popularity of
Vision Transformers (ViTs) in conventional classification settings, an
interesting question is to study their continual learning behaviour. In this
work, we develop a Debiased Dual Distilled Transformer for CIL dubbed
$\textrm{D}^3\textrm{Former}$. The proposed model leverages a hybrid nested ViT
design to ensure data efficiency and scalability to small as well as large
datasets. In contrast to a recent ViT based CIL approach, our
$\textrm{D}^3\textrm{Former}$ does not dynamically expand its architecture when
new tasks are learned and remains suitable for a large number of incremental
tasks. The improved CIL behaviour of $\textrm{D}^3\textrm{Former}$ owes to two
fundamental changes to the ViT design. First, we treat the incremental learning
as a long-tail classification problem where the majority samples from new
classes vastly outnumber the limited exemplars available for old classes. To
avoid the bias against the minority old classes, we propose to dynamically
adjust logits to emphasize on retaining the representations relevant to old
tasks. Second, we propose to preserve the configuration of spatial attention
maps as the learning progresses across tasks. This helps in reducing
catastrophic forgetting by constraining the model to retain the attention on
the most discriminative regions. $\textrm{D}^3\textrm{Former}$ obtains
favorable results on incremental versions of CIFAR-100, MNIST, SVHN, and
ImageNet datasets. Code is available at https://tinyurl.com/d3former
Related papers
- Learning Dynamics of Meta-Learning in Small Model Pretraining [2.6684726101845]
We integrate first-order MAML with subset-masked LM pretraining.<n>We produce four LLama-style decoder-only models (11M-570M params)<n>We evaluate it on a fundamental NLP task with many settings and real-world applications.
arXiv Detail & Related papers (2025-08-04T08:34:30Z) - H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning [25.65324419553667]
We introduce $textbfTriply-Hierarchical Diffusion Policy(textbfH$mathbf3$DP)$, a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation.<n> Extensive experiments demonstrate that H$3$DP yields a $mathbf+27.5%$ average relative improvement over baselines across $mathbf44$ simulation tasks and achieves superior performance in $mathbf4$ challenging bimanual real-world manipulation tasks.
arXiv Detail & Related papers (2025-05-12T17:59:43Z) - Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\ exttt{D}}$ual-$\mathbf{\ exttt{H}}$ead $\mathbf{\ exttt{O}}$ptimization [49.2338910653152]
Vision-constrained models (VLMs) have achieved remarkable success across diverse tasks by leveraging rich textual information with minimal labeled data.<n> Knowledge distillation (KD) offers a well-established solution to this problem; however, recent KD approaches from VLMs often involve multi-stage training or additional tuning.<n>We propose $mathbftextttDHO$ -- a simple yet effective KD framework that transfers knowledge from VLMs to compact, task-specific models in semi-language settings.
arXiv Detail & Related papers (2025-05-12T15:39:51Z) - Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor [9.54964908165465]
This paper introduces a new method to tackle the Few-Shot Continual Incremental Learning problem in 3D point cloud environments.
We leverage a foundational 3D model trained extensively on point cloud data.
Our approach uses a dual cache system: first, it uses previous test samples based on how confident the model was in its predictions to prevent forgetting, and second, it includes a small number of new task samples to prevent overfitting.
arXiv Detail & Related papers (2024-10-11T20:23:00Z) - Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion [10.322832012497722]
Class-incremental learning is a challenging problem, where the goal is to train a model that can classify data from an increasing number of classes over time.
With the advancement of vision-language pre-trained models such as CLIP, they demonstrate good generalization ability.
However, further adaptation to downstream tasks by simply fine-tuning the model leads to severe forgetting.
Most existing works with pre-trained models assume that the forgetting of old classes is uniform when the model acquires new knowledge.
arXiv Detail & Related papers (2024-07-19T09:20:33Z) - Inheritune: Training Smaller Yet More Attentive Language Models [61.363259848264725]
Inheritune is a simple yet effective training recipe for developing smaller, high-performing language models.
We demonstrate that Inheritune enables the training of various sizes of GPT-2 models on datasets like OpenWebText-9B and FineWeb_edu.
arXiv Detail & Related papers (2024-04-12T17:53:34Z) - Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning [65.57123249246358]
We propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL.
We train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces.
Our prototype complement strategy synthesizes old classes' new features without using any old class instance.
arXiv Detail & Related papers (2024-03-18T17:58:13Z) - CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar
Class-Incremental Learning [34.59310641291726]
In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge.
We propose a new architecture, named continual expansion and absorption transformer(CEAT)
The model can learn the novel knowledge by extending the expanded-fusion layers in parallel with the frozen previous parameters.
To improve the learning ability of the model, we designed a novel prototype contrastive loss to reduce the overlap between old and new classes in the feature space.
arXiv Detail & Related papers (2024-03-11T12:40:12Z) - A streamlined Approach to Multimodal Few-Shot Class Incremental Learning
for Fine-Grained Datasets [23.005760505169803]
Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams.
We propose Session-Specific Prompts (SSP), which enhances the separability of image-text embeddings across sessions.
The second, Hyperbolic distance, compresses representations of image-text pairs within the same class while expanding those from different classes, leading to better representations.
arXiv Detail & Related papers (2024-03-10T19:50:03Z) - Two Independent Teachers are Better Role Model [7.001845833295753]
We propose a new deep learning model called 3D-DenseUNet.
It works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss.
We also propose a new method called Two Independent Teachers, that summarizes the model weights instead of label predictions.
arXiv Detail & Related papers (2023-06-09T08:22:41Z) - Cross-Modal Adapter for Text-Video Retrieval [91.9575196703281]
We present a novel $textbfCross-Modal Adapter$ for parameter-efficient fine-tuning.
Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers.
It achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets.
arXiv Detail & Related papers (2022-11-17T16:15:30Z) - Not All Models Are Equal: Predicting Model Transferability in a
Self-challenging Fisher Space [51.62131362670815]
This paper addresses the problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks.
It proposes a new transferability metric called textbfSelf-challenging textbfFisher textbfDiscriminant textbfAnalysis (textbfSFDA)
arXiv Detail & Related papers (2022-07-07T01:33:25Z) - Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes.
The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL)
We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z) - Shared and Private VAEs with Generative Replay for Continual Learning [1.90365714903665]
Continual learning tries to learn new tasks without forgetting previously learned ones.
Most of the existing artificial neural network(ANN) models fail, while humans do the same by remembering previous works throughout their life.
We show our hybrid model effectively avoids forgetting and achieves state-of-the-art results on visual continual learning benchmarks such as MNIST, Permuted MNIST(QMNIST), CIFAR100, and miniImageNet datasets.
arXiv Detail & Related papers (2021-05-17T06:18:36Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models.
We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student.
Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.