Adaptive Pruning of Pretrained Transformer via Differential Inclusions
- URL: http://arxiv.org/abs/2501.03289v1
- Date: Mon, 06 Jan 2025 06:34:52 GMT
- Title: Adaptive Pruning of Pretrained Transformer via Differential Inclusions
- Authors: Yizhuo Ding, Ke Fan, Yikai Wang, Xinwei Sun, Yanwei Fu,
- Abstract summary: Current compression algorithms prune transformers at fixed compression ratios, requiring a unique pruning process for each ratio.
We propose pruning of pretrained transformers at any desired ratio within a single pruning stage, based on a differential inclusion for a mask parameter.
This dynamic can generate the whole regularization solution path of the mask parameter, whose support set identifies the network structure.
- Score: 48.47890215458465
- License:
- Abstract: Large transformers have demonstrated remarkable success, making it necessary to compress these models to reduce inference costs while preserving their perfor-mance. Current compression algorithms prune transformers at fixed compression ratios, requiring a unique pruning process for each ratio, which results in high computational costs. In contrast, we propose pruning of pretrained transformers at any desired ratio within a single pruning stage, based on a differential inclusion for a mask parameter. This dynamic can generate the whole regularization solution path of the mask parameter, whose support set identifies the network structure. Therefore, the solution path identifies a Transformer weight family with various sparsity levels, offering greater flexibility and customization. In this paper, we introduce such an effective pruning method, termed SPP (Solution Path Pruning). To achieve effective pruning, we segment the transformers into paired modules, including query-key pairs, value-projection pairs, and sequential linear layers, and apply low-rank compression to these pairs, maintaining the output structure while enabling structural compression within the inner states. Extensive experiments conducted on various well-known transformer backbones have demonstrated the efficacy of SPP.
Related papers
- Provable optimal transport with transformers: The essence of depth and prompt engineering [2.8597439883196953]
We prove that a transformer with fixed parameters can effectively solve the optimal transport problem in Wasserstein-2 with entropic regularization for an arbitrary number of points.
Our results rely on an engineered prompt that enables the transformer to implement gradient descent with adaptive stepsizes on the dual optimal transport.
arXiv Detail & Related papers (2024-10-25T19:07:29Z) - Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot [50.16171384920963]
transformer architecture has prevailed in various deep learning settings.
One-layer transformer trained with gradient descent provably learns the sparse token selection task.
arXiv Detail & Related papers (2024-06-11T02:15:53Z) - DSFormer: Effective Compression of Text-Transformers by Dense-Sparse
Weight Factorization [12.277820111814691]
DSFormer is a simple alternative factorization scheme which expresses a target weight matrix as the product of a small dense and a semi-structured sparse matrix.
Our approach is also to mainstream compressors and offers up to 50% additional compression when added to popular distilled, layer-shared and quantized transformers.
arXiv Detail & Related papers (2023-12-20T17:27:25Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood
Filling [1.0128808054306186]
We propose a novel sparsification scheme for the Transformer that integrates convolution filters and the flood filling method.
Our sparsification approach reduces the computational complexity and memory footprint of the Transformer during training.
New SPION achieves up to 3.08X speedup over existing state-of-the-art sparse Transformer models.
arXiv Detail & Related papers (2023-09-22T02:14:46Z) - Transformer-based Variable-rate Image Compression with
Region-of-interest Control [24.794581811606445]
This paper proposes a transformer-based learned image compression system.
It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest functionality.
arXiv Detail & Related papers (2023-05-18T08:40:34Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - The Lighter The Better: Rethinking Transformers in Medical Image
Segmentation Through Adaptive Pruning [26.405243756778606]
We propose to employ adaptive pruning to transformers for medical image segmentation and propose a lightweight network APFormer.
To our best knowledge, this is the first work on transformer pruning for medical image analysis tasks.
We prove, through ablation studies, that adaptive pruning can work as a plug-n-play module for performance improvement on other hybrid-/transformer-based methods.
arXiv Detail & Related papers (2022-06-29T05:49:36Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping.
We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.