Related papers: The Effects of Grouped Structural Global Pruning of Vision Transformers on Domain Generalisation

The Effects of Grouped Structural Global Pruning of Vision Transformers on Domain Generalisation

URL: http://arxiv.org/abs/2504.04196v1
Date: Sat, 05 Apr 2025 15:05:36 GMT
Title: The Effects of Grouped Structural Global Pruning of Vision Transformers on Domain Generalisation
Authors: Hamza Riaz, Alan F. Smeaton,
Abstract summary: This paper introduces a novel grouped structural pruning method for pre-trained vision transformers (ViT, BeiT, and DeiT)<n>Our method uses dependency graph analysis to identify and remove redundant groups of neurons, weights, filters, or attention heads within transformers.<n>Results show significant improvements in inference speed and fine-tuning time with minimal trade-offs in accuracy and DG task performance.
Score: 2.2124795371148616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the growing sizes of AI models like large language models (LLMs) and vision transformers, deploying them on devices with limited computational resources is a significant challenge particularly when addressing domain generalisation (DG) tasks. This paper introduces a novel grouped structural pruning method for pre-trained vision transformers (ViT, BeiT, and DeiT), evaluated on the PACS and Office-Home DG benchmarks. Our method uses dependency graph analysis to identify and remove redundant groups of neurons, weights, filters, or attention heads within transformers, using a range of selection metrics. Grouped structural pruning is applied at pruning ratios of 50\%, 75\% and 95\% and the models are then fine-tuned on selected distributions from DG benchmarks to evaluate their overall performance in DG tasks. Results show significant improvements in inference speed and fine-tuning time with minimal trade-offs in accuracy and DG task performance. For instance, on the PACS benchmark, pruning ViT, BeiT, and DeiT models by 50\% using the Hessian metric resulted in accuracy drops of only -2.94\%, -1.42\%, and -1.72\%, respectively, while achieving speed boosts of 2.5x, 1.81x, and 2.15x. These findings demonstrate the effectiveness of our approach in balancing model efficiency with domain generalisation performance.

Related papers

Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images [2.2124795371148616]
We evaluate vision tramsformers pre-trained with masked image modelling (MIM) against synthetic out-of-distribution (OOD) benchmarks.<n>Experiments demonstrate BEIT's known robustness while maintaining 94% accuracy on PACS and 87% on Office-Home, despite significant occlusions.<n>These insights bridge the gap between lab-trained models and real-world deployment that offer a blueprint for building AI systems that generalise reliably under uncertainty.
arXiv Detail & Related papers (2025-04-05T16:25:34Z)
GPT Meets Graphs and KAN Splines: Testing Novel Frameworks on Multitask Fine-Tuned GPT-2 with LoRA [0.0]
We explore the potential of integrating learnable and interpretable modules--specifically Kolmogorov-Arnold Networks (KAN) and graph-based representations--within a pre-trained GPT-2 model.
arXiv Detail & Related papers (2025-03-25T19:58:25Z)
Generative adversarial learning with optimal input dimension and its adaptive generator architecture [3.9043107077488797]
We introduce a novel framework called generalized GANs (G-GANs) By incorporating the group penalty and the architecture penalty, G-GANs have several intriguing features. Extensive experiments conducted with simulated and benchmark data demonstrate the superior performance of G-GANs.
arXiv Detail & Related papers (2024-05-06T03:30:02Z)
Controllable Prompt Tuning For Balancing Group Distributional Robustness [53.336515056479705]
We introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. We propose Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data.
arXiv Detail & Related papers (2024-03-05T06:23:55Z)
Generative Parameter-Efficient Fine-Tuning [8.481707805559589]
GIFT learns to generate the fine-tuned weights for a layer directly from its pretrained weights. We show this formulation bridges parameter-efficient fine-tuning and representation fine-tuning.
arXiv Detail & Related papers (2023-12-01T16:33:57Z)
Hierarchical Side-Tuning for Vision Transformers [33.536948382414316]
Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks. PETL has shown potential for achieving high performance with fewer parameter updates compared to full fine-tuning. This paper introduces Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks.
arXiv Detail & Related papers (2023-10-09T04:16:35Z)
Sub-token ViT Embedding via Stochastic Resonance Transformers [51.12001699637727]
Vision Transformer (ViT) architectures represent images as collections of high-dimensional vectorized tokens, each corresponding to a rectangular non-overlapping patch. We propose a training-free method inspired by "stochastic resonance" The resulting "Stochastic Resonance Transformer" (SRT) retains the rich semantic information of the original representation, but grounds it on a finer-scale spatial domain, partly mitigating the coarse effect of spatial tokenization.
arXiv Detail & Related papers (2023-10-06T01:53:27Z)
Robust Representation Learning with Self-Distillation for Domain Generalization [2.0817769887373245]
We propose a novel domain generalization technique called Robust Representation Learning with Self-Distillation. We observe an average accuracy improvement in the range of 1.2% to 2.3% over the state-of-the-art on three datasets.
arXiv Detail & Related papers (2023-02-14T07:39:37Z)
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer [76.2625311630021]
Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks. To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency. We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
arXiv Detail & Related papers (2023-01-13T00:40:24Z)
Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z)
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention [48.697458429460184]
Two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer. This paper proposes a well-designed model named ERNIE-Sparse. It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information, and (ii) Self-Attention Regularization (SAR) to minimize the distance for transformers with different attention topologies.
arXiv Detail & Related papers (2022-03-23T08:47:01Z)
Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage. We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction. Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z)
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.