Related papers: TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers

TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers

URL: http://arxiv.org/abs/2601.02437v1
Date: Mon, 05 Jan 2026 09:00:08 GMT
Title: TAP-ViTs: Task-Adaptive Pruning for On-Device Deployment of Vision Transformers
Authors: Zhibo Wang, Zuoyuan Zhang, Xiaoyi Pang, Qile Zhang, Xuanyi Hao, Shuguo Zhuo, Peng Sun,
Abstract summary: Vision Transformers (ViTs) have demonstrated strong performance across a wide range of vision tasks, yet their substantial computational and memory demands hinder efficient deployment on resource-constrained mobile and edge devices.<n>This paper introduces TAP-ViTs, a novel task-adaptive pruning framework that generates device-specific pruned ViT models without requiring access to any raw local data.
Score: 9.270463842274394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformers (ViTs) have demonstrated strong performance across a wide range of vision tasks, yet their substantial computational and memory demands hinder efficient deployment on resource-constrained mobile and edge devices. Pruning has emerged as a promising direction for reducing ViT complexity. However, existing approaches either (i) produce a single pruned model shared across all devices, ignoring device heterogeneity, or (ii) rely on fine-tuning with device-local data, which is often infeasible due to limited on-device resources and strict privacy constraints. As a result, current methods fall short of enabling task-customized ViT pruning in privacy-preserving mobile computing settings. This paper introduces TAP-ViTs, a novel task-adaptive pruning framework that generates device-specific pruned ViT models without requiring access to any raw local data. Specifically, to infer device-level task characteristics under privacy constraints, we propose a Gaussian Mixture Model (GMM)-based metric dataset construction mechanism. Each device fits a lightweight GMM to approximate its private data distribution and uploads only the GMM parameters. Using these parameters, the cloud selects distribution-consistent samples from public data to construct a task-representative metric dataset for each device. Based on this proxy dataset, we further develop a dual-granularity importance evaluation-based pruning strategy that jointly measures composite neuron importance and adaptive layer importance, enabling fine-grained, task-aware pruning tailored to each device's computational budget. Extensive experiments across multiple ViT backbones and datasets demonstrate that TAP-ViTs consistently outperforms state-of-the-art pruning methods under comparable compression ratios.

Related papers

Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging [62.61159948488935]
Decomposition, Thresholding, and Scaling (DTS) is an approximation-based personalized merging framework.<n>DTS preserves task-specific information with minimal storage overhead.<n>We extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics.
arXiv Detail & Related papers (2025-12-01T09:47:17Z)
Tackling Device Data Distribution Real-time Shift via Prototype-based Parameter Editing [101.07855433979519]
We introduce Persona, a novel personalized method to enhance model generalization without post-deployment retraining.<n> Persona employs a neural adapter in the cloud to generate a parameter editing matrix based on real-time device data.<n>The prototypes are dynamically refined via the parameter editing matrix, facilitating efficient evolution.
arXiv Detail & Related papers (2025-09-08T11:06:50Z)
VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
STAMP: Scalable Task And Model-agnostic Collaborative Perception [24.890993164334766]
STAMP is a task- and model-agnostic, collaborative perception pipeline for heterogeneous agents.<n>It minimizes computational overhead, enhances scalability, and preserves model security.<n>As a first-of-its-kind framework, STAMP aims to advance research in scalable and secure mobility systems towards Level 5 autonomy.
arXiv Detail & Related papers (2025-01-24T16:27:28Z)
Efficient Partitioning Vision Transformer on Edge Devices for Distributed Inference [13.533267828812455]
We propose a novel framework, ED-ViT, which is designed to efficiently split and execute complex Vision Transformers across multiple edge devices.<n>Our approach involves partitioning Vision Transformer models into several sub-models, while each dedicated to handling a specific subset of data classes.<n>We demonstrate that our method significantly cuts down inference latency on edge devices and achieves a reduction in model size by up to 28.9 times and 34.1 times, respectively.
arXiv Detail & Related papers (2024-10-15T14:38:14Z)
UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation [12.511829774226113]
We propose an ultra-lightweight (1M) visual-inertial odometry (VIO) network capable of test-time adaptation (TTA) based on visual-inertial consistency. It achieves 36X smaller network size than state-of-the-art with a minute increase in error -- 1% on the KITTI dataset.
arXiv Detail & Related papers (2024-09-19T22:24:14Z)
Hierarchical Side-Tuning for Vision Transformers [33.536948382414316]
Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks. PETL has shown potential for achieving high performance with fewer parameter updates compared to full fine-tuning. This paper introduces Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks.
arXiv Detail & Related papers (2023-10-09T04:16:35Z)
DeGMix: Efficient Multi-Task Dense Prediction with Deformable and Gating Mixer [129.61363098633782]
We present an efficient multi-task dense prediction with deformable and gating mixer (DeGMix)<n>The proposed DeGMix uses fewer GFLOPs and significantly outperforms current Transformer-based and CNN-based competitive models.
arXiv Detail & Related papers (2023-08-10T17:37:49Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Task-Oriented Over-the-Air Computation for Multi-Device Edge AI [57.50247872182593]
6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task. Task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system.
arXiv Detail & Related papers (2022-11-02T16:35:14Z)
Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. We propose Single-Path Vision Transformer pruning (SPViT) to efficiently and automatically compress the pre-trained ViTs. Our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.
arXiv Detail & Related papers (2021-11-23T11:35:54Z)
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.