Related papers: Heterogeneous Continual Learning

Heterogeneous Continual Learning

URL: http://arxiv.org/abs/2306.08593v1
Date: Wed, 14 Jun 2023 15:54:42 GMT
Title: Heterogeneous Continual Learning
Authors: Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov
Abstract summary: We propose a novel framework to tackle the continual learning (CL) problem with changing network architectures. We build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher. We also propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer.
Score: 88.53038822561197
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. Most CL methods focus on adapting a single architecture to a new task/class by modifying its weights. However, with rapid progress in architecture design, the problem of adapting existing solutions to novel architectures becomes relevant. To address this limitation, we propose Heterogeneous Continual Learning (HCL), where a wide range of evolving network architectures emerge continually together with novel data/tasks. As a solution, we build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher; meanwhile, a new stronger architecture acts as a student. Furthermore, we consider a setup of limited access to previous data and propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer. QDI significantly reduces computational costs compared to previous solutions and improves overall performance. In summary, we propose a new setup for CL with a modified knowledge distillation paradigm and design a quick data inversion method to enhance distillation. Our evaluation of various benchmarks shows a significant improvement on accuracy in comparison to state-of-the-art methods over various networks architectures.

Related papers

Cross-Architecture Distillation Made Simple with Redundancy Suppression [8.844066299737845]
We describe a simple method for cross-architecture knowledge distillation, where the knowledge transfer is cast into a redundant information suppression formulation.<n>We propose to extract the architecture-agnostic knowledge in heterogeneous representations by reducing the redundant architecture-exclusive information.<n>Our method is devoid of the architecture-specific designs and complex operations in the pioneering method of OFA.
arXiv Detail & Related papers (2025-07-29T14:21:40Z)
A Unified Gradient-based Framework for Task-agnostic Continual Learning-Unlearning [30.2773429357068]
Recent advancements in deep models have highlighted the need for intelligent systems that combine continual learning (CL) for knowledge acquisition with machine unlearning (MU) for data removal.<n>We reveal their intrinsic connection through a unified optimization framework based on Kullback-Leibler divergence minimization.<n>Experiments demonstrate that the proposed UG-CLU framework effectively coordinates incremental learning, precise unlearning, and knowledge stability across multiple datasets and model architectures.
arXiv Detail & Related papers (2025-05-21T06:49:05Z)
LT-DARTS: An Architectural Approach to Enhance Deep Long-Tailed Learning [5.214135587370722]
We introduce Long-Tailed Differential Architecture Search (LT-DARTS) We conduct extensive experiments to explore architectural components that demonstrate better performance on long-tailed data. This ensures that the architecture obtained through our search process incorporates superior components.
arXiv Detail & Related papers (2024-11-09T07:19:56Z)
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family. We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS) It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search. It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z)
SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z)
Differentiable Architecture Pruning for Transfer Learning [6.935731409563879]
We propose a gradient-based approach for extracting sub-architectures from a given large model. Our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
arXiv Detail & Related papers (2021-07-07T17:44:59Z)
Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These networks consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
Disturbance-immune Weight Sharing for Neural Architecture Search [96.93812980299428]
We propose a disturbance-immune update strategy for model updating. We theoretically analyze the effectiveness of our strategy in alleviating the performance disturbance risk.
arXiv Detail & Related papers (2020-03-29T17:54:49Z)
RC-DARTS: Resource Constrained Differentiable Architecture Search [162.7199952019152]
We propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster. We show that the RC-DARTS method learns lightweight neural architectures which have smaller model size and lower computational complexity.
arXiv Detail & Related papers (2019-12-30T05:02:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.