RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
- URL: http://arxiv.org/abs/2304.05659v1
- Date: Wed, 12 Apr 2023 07:34:13 GMT
- Title: RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
- Authors: Jiahao Wang, Songyang Zhang, Yong Liu, Taiqiang Wu, Yujiu Yang, Xihui
Liu, Kai Chen, Ping Luo, Dahua Lin
- Abstract summary: This paper studies how to keep a vision backbone effective while removing token mixers in its basic building blocks.
Token mixers, as self-attention for vision transformers (ViTs), are intended to perform information communication between different spatial tokens but suffer from considerable computational cost and latency.
- Score: 95.71132572688143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies how to keep a vision backbone effective while removing
token mixers in its basic building blocks. Token mixers, as self-attention for
vision transformers (ViTs), are intended to perform information communication
between different spatial tokens but suffer from considerable computational
cost and latency. However, directly removing them will lead to an incomplete
model structure prior, and thus brings a significant accuracy drop. To this
end, we first develop an RepIdentityFormer base on the re-parameterizing idea,
to study the token mixer free model architecture. And we then explore the
improved learning paradigm to break the limitation of simple token mixer free
backbone, and summarize the empirical practice into 5 guidelines. Equipped with
the proposed optimization strategy, we are able to build an extremely simple
vision backbone with encouraging performance, while enjoying the high
efficiency during inference. Extensive experiments and ablative analysis also
demonstrate that the inductive bias of network architecture, can be
incorporated into simple network structure with appropriate optimization
strategy. We hope this work can serve as a starting point for the exploration
of optimization-driven efficient network design. Project page:
https://techmonsterwang.github.io/RIFormer/.
Related papers
- Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning [54.956037293979506]
This paper delves into the interplay between vision backbones and vision backbones and their inter-dependent phenomenon termed textittextbfbackbonetextbfoptimizer textbfcoupling textbfbias (BOCB)
We observe that canonical CNNs, such as VGG and ResNet, exhibit a marked co-dependency with SGD families, while recent architectures like ViTs and ConvNeXt share a tight coupling with the adaptive learning rate ones.
arXiv Detail & Related papers (2024-10-08T21:14:23Z) - PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference [44.77064952091458]
PRANCE is a Vision Transformer compression framework that jointly optimize the activated channels and reduces tokens, based on the characteristics of inputs.
We introduce a novel "Result-to-Go" training mechanism that models ViTs' inference process as a sequential decision process.
Our framework is shown to be compatible with various token optimization techniques such as pruning, merging, and pruning-merging strategies.
arXiv Detail & Related papers (2024-07-06T09:04:27Z) - Neural Network Pruning by Gradient Descent [7.427858344638741]
We introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique.
We demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters.
We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
arXiv Detail & Related papers (2023-11-21T11:12:03Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - LegoNet: A Fast and Exact Unlearning Architecture [59.49058450583149]
Machine unlearning aims to erase the impact of specific training samples upon deleted requests from a trained model.
We present a novel network, namely textitLegoNet, which adopts the framework of fixed encoder + multiple adapters''
We show that LegoNet accomplishes fast and exact unlearning while maintaining acceptable performance, synthetically outperforming unlearning baselines.
arXiv Detail & Related papers (2022-10-28T09:53:05Z) - Multi-Agent Feedback Enabled Neural Networks for Intelligent
Communications [28.723523146324002]
In this paper, a novel multi-agent feedback enabled neural network (MAFENN) framework is proposed.
The MAFENN framework is theoretically formulated into a three-player Feedback Stackelberg game, and the game is proved to converge to the Feedback Stackelberg equilibrium.
To verify the MAFENN framework's feasibility in wireless communications, a multi-agent MAFENN based equalizer (MAFENN-E) is developed.
arXiv Detail & Related papers (2022-05-22T05:28:43Z) - Backbone is All Your Need: A Simplified Architecture for Visual Object
Tracking [69.08903927311283]
Existing tracking approaches rely on customized sub-modules and need prior knowledge for architecture selection.
This paper presents a simplified tracking architecture (SimTrack) by leveraging a transformer backbone for joint feature extraction and interaction.
Our SimTrack improves the baseline with 2.5%/2.6% AUC gains on LaSOT/TNL2K and gets results competitive with other specialized tracking algorithms without bells and whistles.
arXiv Detail & Related papers (2022-03-10T12:20:58Z) - Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
Architecture Search [60.965024145243596]
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance.
To alleviate this problem, we present a simple yet effective architecture distillation method.
We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
arXiv Detail & Related papers (2020-10-29T17:55:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.