Minimal Interaction Edge Tuning: A New Paradigm for Visual Adaptation
- URL: http://arxiv.org/abs/2406.17559v2
- Date: Wed, 26 Jun 2024 02:23:32 GMT
- Title: Minimal Interaction Edge Tuning: A New Paradigm for Visual Adaptation
- Authors: Ningyuan Tang, Minghao Fu, Jianxin Wu,
- Abstract summary: We explore a new visual adaptation paradigm called edge tuning, which treats large pretrained models as standalone feature extractors that run on powerful cloud servers.
The fine-tuning carries out on edge devices with small networks which require low computational resources.
We propose Minimal Interaction Edge Tuning, or MIET, which reveals that the sum of intermediate features from pretrained models has minimal information transfer and high adaptation capability.
- Score: 11.656632975033476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid scaling of large vision pretrained models makes fine-tuning tasks more and more difficult on edge devices with low computational resources. We explore a new visual adaptation paradigm called edge tuning, which treats large pretrained models as standalone feature extractors that run on powerful cloud servers. The fine-tuning carries out on edge devices with small networks which require low computational resources. Existing methods that are potentially suitable for our edge tuning paradigm are discussed. But, three major drawbacks hinder their application in edge tuning: low adaptation capability, large adapter network, and high information transfer overhead. To address these issues, we propose Minimal Interaction Edge Tuning, or MIET, which reveals that the sum of intermediate features from pretrained models not only has minimal information transfer but also has high adaptation capability. With a lightweight attention-based adaptor network, MIET achieves information transfer efficiency, parameter efficiency, computational and memory efficiency, and at the same time demonstrates competitive results on various visual adaptation benchmarks.
Related papers
- RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks [16.512587987753967]
RECAST is a novel method that dramatically reduces task-specific trainable parameters to fewer than 50.
We show that RECAST outperforms the state-of-the-art by up to 3% across various scales, architectures, and parameter spaces.
arXiv Detail & Related papers (2024-11-25T19:08:38Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - ReduceFormer: Attention with Tensor Reduction by Summation [4.985969607297595]
We introduce ReduceFormer, a family of models optimized for efficiency with the spirit of attention.
ReduceFormer leverages only simple operations such as reduction and element-wise multiplication, leading to greatly simplified architecture and improved inference performance.
The proposed model family is suitable for edge devices where compute resource and memory bandwidth are limited, as well as for cloud computing where high throughput is sought after.
arXiv Detail & Related papers (2024-06-11T17:28:09Z) - Mini but Mighty: Finetuning ViTs with Mini Adapters [7.175668563148084]
adapters perform poorly when the dimension of adapters is small.
We propose MiMi, a training framework that addresses this issue.
Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters.
arXiv Detail & Related papers (2023-11-07T10:41:27Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Consolidator: Mergeable Adapter with Grouped Connections for Visual
Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer.
We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters.
Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z) - Tiny-Attention Adapter: Contexts Are More Important Than the Number of
Parameters [25.958600375299735]
Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters.
In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters.
Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions.
arXiv Detail & Related papers (2022-10-18T15:20:44Z) - Pro-tuning: Unified Prompt Tuning for Vision Tasks [133.12978197265596]
Fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks.
In this work, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.
arXiv Detail & Related papers (2022-07-28T21:09:31Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - When Vision Transformers Outperform ResNets without Pretraining or
Strong Data Augmentations [111.44860506703307]
Vision Transformers (ViTs) and existing VisionNets signal efforts on replacing hand-wired features or inductive throughputs with general-purpose neural architectures.
This paper investigates ViTs and Res-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and inference.
We show that the improved robustness attributes to sparser active neurons in the first few layers.
The resultant ViTs outperform Nets of similar size and smoothness when trained from scratch on ImageNet without large-scale pretraining or strong data augmentations.
arXiv Detail & Related papers (2021-06-03T02:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.