DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
- URL: http://arxiv.org/abs/2111.11326v1
- Date: Mon, 22 Nov 2021 16:29:06 GMT
- Title: DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
- Authors: Arthur Douillard, Alexandre Ram\'e, Guillaume Couairon, Matthieu Cord
- Abstract summary: We propose a transformer architecture based on a dedicated encoder/decoder framework.
Through a dynamic expansion of special tokens, we specialize each forward of our decoder network on a task distribution.
Our strategy scales to a large number of tasks while having negligible memory and time overheads.
- Score: 89.92242000948026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep network architectures struggle to continually learn new tasks without
forgetting the previous tasks. A recent trend indicates that dynamic
architectures based on an expansion of the parameters can reduce catastrophic
forgetting efficiently in continual learning. However, existing approaches
often require a task identifier at test-time, need complex tuning to balance
the growing number of parameters, and barely share any information across
tasks. As a result, they struggle to scale to a large number of tasks without
significant overhead. In this paper, we propose a transformer architecture
based on a dedicated encoder/decoder framework. Critically, the encoder and
decoder are shared among all tasks. Through a dynamic expansion of special
tokens, we specialize each forward of our decoder network on a task
distribution. Our strategy scales to a large number of tasks while having
negligible memory and time overheads due to strict control of the parameters
expansion. Moreover, this efficient strategy doesn't need any hyperparameter
tuning to control the network's expansion. Our model reaches excellent results
on CIFAR100 and state-of-the-art performances on the large-scale ImageNet100
and ImageNet1000 while having less parameters than concurrent dynamic
frameworks.
Related papers
- AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - Efficient Controllable Multi-Task Architectures [85.76598445904374]
We propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable.
Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost.
This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures.
arXiv Detail & Related papers (2023-08-22T19:09:56Z) - Multi-task neural networks by learned contextual inputs [0.0]
It is a multi-task learning architecture based on a fully shared neural network and an augmented input vector containing trainable task parameters.
The architecture is interesting due to its powerful task mechanism, which facilitates a low-dimensional task parameter space.
The architecture's performance is compared to similar neural network architectures on ten datasets.
arXiv Detail & Related papers (2023-03-01T19:25:52Z) - PAD-Net: An Efficient Framework for Dynamic Networks [72.85480289152719]
Common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones.
We propose a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.
Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures.
arXiv Detail & Related papers (2022-11-10T12:42:43Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Efficient Retrieval Optimized Multi-task Learning [16.189136169520424]
We propose a novel Retrieval Optimized Multi-task (ROM) framework for jointly training self-supervised tasks, knowledge retrieval, and extractive question answering.
Our ROM approach presents a unified and generalizable framework that enables scaling efficiently to multiple tasks.
Using our framework, we achieve comparable or better performance than recent methods on QA, while drastically reducing the number of parameters.
arXiv Detail & Related papers (2021-04-20T17:16:34Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for
Real-Time Semantic Segmentation [27.232578592161673]
We devise a novel lightweight network using a multi-scale context fusion scheme (MSCFNet)
The proposed MSCFNet contains only 1.15M parameters, achieves 71.9% Mean IoU and can run at over 50 FPS on a single Titan XP GPU configuration.
arXiv Detail & Related papers (2021-03-24T08:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.