Knowledge Amalgamation for Object Detection with Transformers
- URL: http://arxiv.org/abs/2203.03187v1
- Date: Mon, 7 Mar 2022 07:45:22 GMT
- Title: Knowledge Amalgamation for Object Detection with Transformers
- Authors: Haofei Zhang, Feng Mao, Mengqi Xue, Gongfan Fang, Zunlei Feng, Jie
Song, Mingli Song
- Abstract summary: Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a compact student.
We propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA)
In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one.
- Score: 36.7897364648987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge amalgamation (KA) is a novel deep model reusing task aiming to
transfer knowledge from several well-trained teachers to a multi-talented and
compact student. Currently, most of these approaches are tailored for
convolutional neural networks (CNNs). However, there is a tendency that
transformers, with a completely different architecture, are starting to
challenge the domination of CNNs in many computer vision tasks. Nevertheless,
directly applying the previous KA methods to transformers leads to severe
performance degradation. In this work, we explore a more effective KA scheme
for transformer-based object detection models. Specifically, considering the
architecture characteristics of transformers, we propose to dissolve the KA
into two aspects: sequence-level amalgamation (SA) and task-level amalgamation
(TA). In particular, a hint is generated within the sequence-level amalgamation
by concatenating teacher sequences instead of redundantly aggregating them to a
fixed-size one as previous KA works. Besides, the student learns heterogeneous
detection tasks through soft targets with efficiency in the task-level
amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that
the sequence-level amalgamation significantly boosts the performance of
students, while the previous methods impair the students. Moreover, the
transformer-based students excel in learning amalgamated knowledge, as they
have mastered heterogeneous detection tasks rapidly and achieved superior or at
least comparable performance to those of the teachers in their specializations.
Related papers
- Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers [22.1372572833618]
We propose a novel few-shot feature distillation approach for vision transformers.
We first copy the weights from intermittent layers of existing vision transformers into shallower architectures (students)
Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario.
arXiv Detail & Related papers (2024-04-14T18:57:38Z) - Remembering Transformer for Continual Learning [9.879896956915598]
We propose Remembering Transformer, inspired by the brain's Complementary Learning Systems.
Remembering Transformer employs a mixture-of-adapters architecture and a generative model-based novelty detection mechanism.
We conducted extensive experiments, including ablation studies on the novelty detection mechanism and model capacity of the mixture-of-adapters.
arXiv Detail & Related papers (2024-04-11T07:22:14Z) - Associative Transformer [26.967506484952214]
We propose Associative Transformer (AiT) to enhance the association among sparsely attended input patches.
AiT requires significantly fewer parameters and attention layers while outperforming Vision Transformers and a broad range of sparse Transformers.
arXiv Detail & Related papers (2023-09-22T13:37:10Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - A Neural ODE Interpretation of Transformer Layers [8.839601328192957]
Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems.
We build upon this connection and propose a modification of the internal architecture of a transformer layer.
Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks.
arXiv Detail & Related papers (2022-12-12T16:18:58Z) - E2-AEN: End-to-End Incremental Learning with Adaptively Expandable
Network [57.87240860624937]
We propose an end-to-end trainable adaptively expandable network named E2-AEN.
It dynamically generates lightweight structures for new tasks without any accuracy drop in previous tasks.
E2-AEN reduces cost and can be built upon any feed-forward architectures in an end-to-end manner.
arXiv Detail & Related papers (2022-07-14T09:04:51Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot
MultiBox Detector [15.656374849760734]
We present a novel object detection architecture, named Convolutional vision Transformer Based Attentive Single Shot MultiBox Detector (CvT-ASSD)
Our model CvT-ASSD can leads to good system efficiency and performance while being pretrained on large-scale detection datasets such as PASCAL VOC and MS COCO.
arXiv Detail & Related papers (2021-10-24T06:45:33Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.