Related papers: Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

URL: http://arxiv.org/abs/2407.14302v2
Date: Tue, 23 Jul 2024 07:57:17 GMT
Title: Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
Authors: Yurong Zhang, Honghao Chen, Xinyu Zhang, Xiangxiang Chu, Li Song,
Abstract summary: This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter) We devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy. We reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy.
Score: 22.615830919860777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Parameter-efficient transfer learning (PETL) is a promising task, aiming to adapt the large-scale pre-trained model to downstream tasks with a relatively modest cost. However, current PETL methods struggle in compressing computational complexity and bear a heavy inference burden due to the complete forward process. This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter), that boosts PETL efficiency by subtly disentangling features in multiple levels. Our approach is simple: first, we devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy. Second, we introduce a bidirectional sparsity strategy driven by the pursuit of powerful generalization ability. These qualities enable us to fine-tune efficiently and effectively: we reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy. Extensive experiments on diverse datasets and pretrained backbones demonstrate the potential of Dyn-Adapter serving as a general efficiency booster for PETL in vision recognition tasks.

Related papers

SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition [69.58329995485158]
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance. We propose a novel method to realize seamless adaptation of foundation models to VPR. In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
arXiv Detail & Related papers (2025-02-23T15:01:09Z)
Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder [12.310140622800372]
We propose APE: efficient reinforcement learning through Adaptively Pretrained visual. APE uses adaptive augmentation strategy during the pretraining phase and extracts generalizable features with only a few interactions within the task environments in the policy learning period. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE.
arXiv Detail & Related papers (2025-02-08T12:57:02Z)
big.LITTLE Vision Transformer for Efficient Visual Recognition [34.015778625984055]
big.LITTLE Vision Transformer is an innovative architecture aimed at achieving efficient visual recognition. System is composed of two distinct blocks: the big performance block and the LITTLE efficiency block. When processing an image, our system determines the importance of each token and allocates them accordingly.
arXiv Detail & Related papers (2024-10-14T08:21:00Z)
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation [15.97351561456467]
In this paper, we propose a novel PEFT approach, input-Conditioned transFormer, termed iConFormer. We introduce an input-Conditioned Network (iCoN) in the dynamic adapter that enables instance-level feature transformation. To be specific, iCoN generates channel-wise convolutional kernels for each feature and transform it using adaptive convolution process to effectively capture task-specific and fine-grained details tailor to downstream tasks.
arXiv Detail & Related papers (2024-09-04T16:06:23Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation. PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z)
Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision. We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation. Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z)
Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks. We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z)
Towards Efficient Visual Adaption via Structural Re-parameterization [76.57083043547296]
We propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k.
arXiv Detail & Related papers (2023-02-16T06:14:15Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.