Related papers: D3Former: Debiased Dual Distilled Transformer for Incremental Learning

D3Former: Debiased Dual Distilled Transformer for Incremental Learning

URL: http://arxiv.org/abs/2208.00777v3
Date: Sat, 3 Jun 2023 11:48:54 GMT
Title: D3Former: Debiased Dual Distilled Transformer for Incremental Learning
Authors: Abdelrahman Mohamed, Rushali Grandhe, K J Joseph, Salman Khan, Fahad Khan
Abstract summary: In class incremental learning (CIL) setting, groups of classes are introduced to a model in each learning phase. The goal is to learn a unified model performant on all the classes observed so far. We develop a Debiased Dual Distilled Transformer for CIL dubbed $textrmD3textrmFormer$.
Score: 25.65032941918354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In class incremental learning (CIL) setting, groups of classes are introduced to a model in each learning phase. The goal is to learn a unified model performant on all the classes observed so far. Given the recent popularity of Vision Transformers (ViTs) in conventional classification settings, an interesting question is to study their continual learning behaviour. In this work, we develop a Debiased Dual Distilled Transformer for CIL dubbed $\textrm{D}^3\textrm{Former}$. The proposed model leverages a hybrid nested ViT design to ensure data efficiency and scalability to small as well as large datasets. In contrast to a recent ViT based CIL approach, our $\textrm{D}^3\textrm{Former}$ does not dynamically expand its architecture when new tasks are learned and remains suitable for a large number of incremental tasks. The improved CIL behaviour of $\textrm{D}^3\textrm{Former}$ owes to two fundamental changes to the ViT design. First, we treat the incremental learning as a long-tail classification problem where the majority samples from new classes vastly outnumber the limited exemplars available for old classes. To avoid the bias against the minority old classes, we propose to dynamically adjust logits to emphasize on retaining the representations relevant to old tasks. Second, we propose to preserve the configuration of spatial attention maps as the learning progresses across tasks. This helps in reducing catastrophic forgetting by constraining the model to retain the attention on the most discriminative regions. $\textrm{D}^3\textrm{Former}$ obtains favorable results on incremental versions of CIFAR-100, MNIST, SVHN, and ImageNet datasets. Code is available at https://tinyurl.com/d3former

Related papers

Learning Dynamics of Meta-Learning in Small Model Pretraining [2.6684726101845]
We integrate first-order MAML with subset-masked LM pretraining.<n>We produce four LLama-style decoder-only models (11M-570M params)<n>We evaluate it on a fundamental NLP task with many settings and real-world applications.
arXiv Detail & Related papers (2025-08-04T08:34:30Z)
H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning [25.65324419553667]
We introduce $textbfTriply-Hierarchical Diffusion Policy(textbfH$mathbf3$DP)$, a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation.<n> Extensive experiments demonstrate that H$3$DP yields a $mathbf+27.5%$ average relative improvement over baselines across $mathbf44$ simulation tasks and achieves superior performance in $mathbf4$ challenging bimanual real-world manipulation tasks.
arXiv Detail & Related papers (2025-05-12T17:59:43Z)
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\ exttt{D}}$ual-$\mathbf{\ exttt{H}}$ead $\mathbf{\ exttt{O}}$ptimization [49.2338910653152]
Vision-constrained models (VLMs) have achieved remarkable success across diverse tasks by leveraging rich textual information with minimal labeled data.<n> Knowledge distillation (KD) offers a well-established solution to this problem; however, recent KD approaches from VLMs often involve multi-stage training or additional tuning.<n>We propose $mathbftextttDHO$ -- a simple yet effective KD framework that transfers knowledge from VLMs to compact, task-specific models in semi-language settings.
arXiv Detail & Related papers (2025-05-12T15:39:51Z)
Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor [9.54964908165465]
This paper introduces a new method to tackle the Few-Shot Continual Incremental Learning problem in 3D point cloud environments. We leverage a foundational 3D model trained extensively on point cloud data. Our approach uses a dual cache system: first, it uses previous test samples based on how confident the model was in its predictions to prevent forgetting, and second, it includes a small number of new task samples to prevent overfitting.
arXiv Detail & Related papers (2024-10-11T20:23:00Z)
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion [10.322832012497722]
Class-incremental learning is a challenging problem, where the goal is to train a model that can classify data from an increasing number of classes over time. With the advancement of vision-language pre-trained models such as CLIP, they demonstrate good generalization ability. However, further adaptation to downstream tasks by simply fine-tuning the model leads to severe forgetting. Most existing works with pre-trained models assume that the forgetting of old classes is uniform when the model acquires new knowledge.
arXiv Detail & Related papers (2024-07-19T09:20:33Z)
Inheritune: Training Smaller Yet More Attentive Language Models [61.363259848264725]
Inheritune is a simple yet effective training recipe for developing smaller, high-performing language models. We demonstrate that Inheritune enables the training of various sizes of GPT-2 models on datasets like OpenWebText-9B and FineWeb_edu.
arXiv Detail & Related papers (2024-04-12T17:53:34Z)
Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z)
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning [65.57123249246358]
We propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL. We train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces. Our prototype complement strategy synthesizes old classes' new features without using any old class instance.
arXiv Detail & Related papers (2024-03-18T17:58:13Z)
CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learning [34.59310641291726]
In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge. We propose a new architecture, named continual expansion and absorption transformer(CEAT) The model can learn the novel knowledge by extending the expanded-fusion layers in parallel with the frozen previous parameters. To improve the learning ability of the model, we designed a novel prototype contrastive loss to reduce the overlap between old and new classes in the feature space.
arXiv Detail & Related papers (2024-03-11T12:40:12Z)
A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets [23.005760505169803]
Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams. We propose Session-Specific Prompts (SSP), which enhances the separability of image-text embeddings across sessions. The second, Hyperbolic distance, compresses representations of image-text pairs within the same class while expanding those from different classes, leading to better representations.
arXiv Detail & Related papers (2024-03-10T19:50:03Z)
Two Independent Teachers are Better Role Model [7.001845833295753]
We propose a new deep learning model called 3D-DenseUNet. It works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss. We also propose a new method called Two Independent Teachers, that summarizes the model weights instead of label predictions.
arXiv Detail & Related papers (2023-06-09T08:22:41Z)
Cross-Modal Adapter for Text-Video Retrieval [91.9575196703281]
We present a novel $textbfCross-Modal Adapter$ for parameter-efficient fine-tuning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. It achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets.
arXiv Detail & Related papers (2022-11-17T16:15:30Z)
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space [51.62131362670815]
This paper addresses the problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks. It proposes a new transferability metric called textbfSelf-challenging textbfFisher textbfDiscriminant textbfAnalysis (textbfSFDA)
arXiv Detail & Related papers (2022-07-07T01:33:25Z)
Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes. The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL) We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z)
Shared and Private VAEs with Generative Replay for Continual Learning [1.90365714903665]
Continual learning tries to learn new tasks without forgetting previously learned ones. Most of the existing artificial neural network(ANN) models fail, while humans do the same by remembering previous works throughout their life. We show our hybrid model effectively avoids forgetting and achieves state-of-the-art results on visual continual learning benchmarks such as MNIST, Permuted MNIST(QMNIST), CIFAR100, and miniImageNet datasets.
arXiv Detail & Related papers (2021-05-17T06:18:36Z)
Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios. For dataset bias due to different samplers, we propose shifted batch normalization. Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z)
Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks. ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks. We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z)
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models. We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student. Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.