Related papers: The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures

The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures

URL: http://arxiv.org/abs/2509.26207v1
Date: Tue, 30 Sep 2025 13:10:19 GMT
Title: The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Authors: Andrea Diecidue, Carlo Alberto Barbano, Piero Fraternali, Mathieu Fontaine, Enzo Tartaglione,
Abstract summary: We propose a novel pruning technique targeted explicitly at the attention mechanism.<n>We decouple the pruning of the four layers in the attention block, namely: query, keys, values and outputs' projection matrices.<n>Our results show that even by pruning 50% of the attention parameters we incur in performance degradation of less than 1%
Score: 21.334985032433774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based models have become the state of the art across multiple domains, from natural language processing to machine listening, thanks to attention mechanisms. However, the attention layers require a large number of parameters and high-end hardware for both training and inference. We propose a novel pruning technique targeted explicitly at the attention mechanism, where we decouple the pruning of the four layers in the attention block, namely: query, keys, values and outputs' projection matrices. We also investigate pruning strategies to prune along the head and channel dimensions, and compare the performance of the Audio Spectrogram Transformer (AST) model under different pruning scenarios. Our results show that even by pruning 50\% of the attention parameters we incur in performance degradation of less than 1\%

Related papers

Nexus: Higher-Order Attention Mechanisms in Transformers [82.07756094886552]
Transformers have achieved significant success across various domains, relying on self-attention to capture dependencies.<n>In this paper, we propose the Nexus, a novel architecture designed to enhance representational power through a recursion framework.<n>We provide theoretical analysis demonstrating that our method breaks the linear bottleneck of standard attention. Empirically, Nexus outperforms standard Transformers on multiple benchmarks.
arXiv Detail & Related papers (2025-12-03T02:25:38Z)
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models [14.14413223631804]
In visual generation, the quadratic complexity of attention mechanisms results in high memory and computational costs.<n>We propose an alternative strategy: *reorganizing* the attention pattern to alleviate the challenges.<n>Inspired by the local aggregation nature of visual feature extraction, we design a novel **Pattern-Aware token ReOrdering (PARO)** technique.
arXiv Detail & Related papers (2025-06-19T06:25:02Z)
LASER: Attention with Exponential Transformation [20.1832156343096]
We analyze the gradients backpropagated through the softmax operation in the attention mechanism and observe that these gradients can often be small.<n>We introduce a new attention mechanism called LASER, which we analytically show to admit a larger gradient signal.<n>We show that LASER attention can be implemented by making small modifications to existing attention implementations.
arXiv Detail & Related papers (2024-11-05T20:18:28Z)
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision. The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z)
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models [29.57891007810509]
Large Language Models (LLMs) have demonstrated outstanding performance across a variety of natural language processing tasks. We introduce EchoAtt, a novel framework aimed at optimizing transformer-based models by analyzing and leveraging the similarity of attention patterns across layers. Our best results with TinyLLaMA-1.1B demonstrate that EchoAtt improves inference speed by 15%, training speed by 25%, and reduces the number of parameters by approximately 4%, all while improving zero-shot performance.
arXiv Detail & Related papers (2024-09-22T21:08:37Z)
Visual Transformers for Primates Classification and Covid Detection [8.747840760772268]
We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21.
arXiv Detail & Related papers (2022-12-20T09:10:25Z)
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers [59.57128476584361]
We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones. We find that without any input-dependent attention, all models achieve competitive performance. We show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success.
arXiv Detail & Related papers (2022-11-07T12:37:54Z)
Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z)
Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture. We show that our method works for both classification and regression problems. We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z)
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers [55.40032342541187]
We pre-train a transformer-based model with attention algorithms in a self-supervised fashion and treat them as feature extractors on downstream tasks. Our approach shows comparable performance to the typical self-attention yet requires 20% less time in both training and inference.
arXiv Detail & Related papers (2020-06-09T10:40:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.