PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
- URL: http://arxiv.org/abs/2410.17661v1
- Date: Wed, 23 Oct 2024 08:24:47 GMT
- Title: PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
- Authors: Maximilian Augustin, Syed Shakib Sarwar, Mostafa Elhoushi, Sai Qian Zhang, Yuecheng Li, Barbara De Salvo,
- Abstract summary: We show how to achieve the best task-adaptation performance and introduce PETAH: Efficient Task Adaptation for Hybrid Transformers.
Our PETAH-adapted hybrid models outperform established task-adaptation techniques for ViTs while requiring fewer parameters and being more efficient on mobile hardware.
- Score: 9.235131774252416
- License:
- Abstract: Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute requirements, many resource-constrained applications still rely on convolutional or hybrid models that combine the benefits of convolution and attention layers and achieve the best results in the sub 100M parameter range. Simultaneously, task adaptation techniques that allow for the use of one shared transformer backbone for multiple downstream tasks, resulting in great storage savings at negligible cost in performance, have not yet been adopted for hybrid transformers. In this work, we investigate how to achieve the best task-adaptation performance and introduce PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers. We further combine PETAH adaptation with pruning to achieve highly performant and storage friendly models for multi-tasking. In our extensive evaluation on classification and other vision tasks, we demonstrate that our PETAH-adapted hybrid models outperform established task-adaptation techniques for ViTs while requiring fewer parameters and being more efficient on mobile hardware.
Related papers
- ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations.
Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality.
No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork.
It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner.
We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z) - Selective Feature Adapter for Dense Vision Transformers [30.409313135985528]
selective feature adapter (SFA) achieves comparable or better performance than fully fine-tuned models across various dense tasks.
SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model.
Experiments show that the dual adapter module, a.k.a SFA, is essential to achieve the best trade-off on dense vision tasks.
arXiv Detail & Related papers (2023-10-03T07:17:58Z) - Consolidator: Mergeable Adapter with Grouped Connections for Visual
Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer.
We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters.
Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z) - Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems [23.261607952479377]
Vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation.
We propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs.
We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, compared with existing PTQ methods.
arXiv Detail & Related papers (2023-03-22T13:41:22Z) - Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters.
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.