Heterogeneous Continual Learning
- URL: http://arxiv.org/abs/2306.08593v1
- Date: Wed, 14 Jun 2023 15:54:42 GMT
- Title: Heterogeneous Continual Learning
- Authors: Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov
- Abstract summary: We propose a novel framework to tackle the continual learning (CL) problem with changing network architectures.
We build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher.
We also propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer.
- Score: 88.53038822561197
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a novel framework and a solution to tackle the continual learning
(CL) problem with changing network architectures. Most CL methods focus on
adapting a single architecture to a new task/class by modifying its weights.
However, with rapid progress in architecture design, the problem of adapting
existing solutions to novel architectures becomes relevant. To address this
limitation, we propose Heterogeneous Continual Learning (HCL), where a wide
range of evolving network architectures emerge continually together with novel
data/tasks. As a solution, we build on top of the distillation family of
techniques and modify it to a new setting where a weaker model takes the role
of a teacher; meanwhile, a new stronger architecture acts as a student.
Furthermore, we consider a setup of limited access to previous data and propose
Quick Deep Inversion (QDI) to recover prior task visual features to support
knowledge transfer. QDI significantly reduces computational costs compared to
previous solutions and improves overall performance. In summary, we propose a
new setup for CL with a modified knowledge distillation paradigm and design a
quick data inversion method to enhance distillation. Our evaluation of various
benchmarks shows a significant improvement on accuracy in comparison to
state-of-the-art methods over various networks architectures.
Related papers
- One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS)
It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search.
It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Differentiable Architecture Pruning for Transfer Learning [6.935731409563879]
We propose a gradient-based approach for extracting sub-architectures from a given large model.
Our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks.
We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
arXiv Detail & Related papers (2021-07-07T17:44:59Z) - Learn to Bind and Grow Neural Structures [0.3553493344868413]
We present a new framework, Learn to Bind and Grow, which learns a neural architecture for a new task incrementally.
Central to our approach is a novel, interpretable, parameterization of the shared, multi-task architecture space.
Experiments on continual learning benchmarks show that our framework performs comparably with earlier expansion based approaches.
arXiv Detail & Related papers (2020-11-21T09:40:26Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - Disturbance-immune Weight Sharing for Neural Architecture Search [96.93812980299428]
We propose a disturbance-immune update strategy for model updating.
We theoretically analyze the effectiveness of our strategy in alleviating the performance disturbance risk.
arXiv Detail & Related papers (2020-03-29T17:54:49Z) - RC-DARTS: Resource Constrained Differentiable Architecture Search [162.7199952019152]
We propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster.
We show that the RC-DARTS method learns lightweight neural architectures which have smaller model size and lower computational complexity.
arXiv Detail & Related papers (2019-12-30T05:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.