A Fast Transformer-based General-Purpose Lossless Compressor
- URL: http://arxiv.org/abs/2203.16114v2
- Date: Fri, 1 Apr 2022 14:41:36 GMT
- Title: A Fast Transformer-based General-Purpose Lossless Compressor
- Authors: Yu Mao, Yufei Cui, Tei-Wei Kuo, Chun Jason Xue
- Abstract summary: We introduce transformer into deep learning compressors to build history-dependencies in parallel.
Existing transformer is too heavy in computation and incompatible to compression tasks.
Byte-grouping and Shared-ffn schemes are proposed to fully utilize the capacity of the single-layer transformer.
- Score: 19.5544227045828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep-learning-based compressor has received interests recently due to much
improved compression ratio. However, modern approaches suffer from long
execution time. To ease this problem, this paper targets on cutting down the
execution time of deep-learning-based compressors. Building
history-dependencies sequentially (e.g., recurrent neural networks) is
responsible for long inference latency. Instead, we introduce transformer into
deep learning compressors to build history-dependencies in parallel. However,
existing transformer is too heavy in computation and incompatible to
compression tasks.
This paper proposes a fast general-purpose lossless compressor, TRACE, by
designing a compression-friendly structure based on a single-layer transformer.
We first design a new metric to advise the selection part of compression model
structures. Byte-grouping and Shared-ffn schemes are further proposed to fully
utilize the capacity of the single-layer transformer. These features allow
TRACE to achieve competitive compression ratio and a much faster speed. In
addition, we further accelerate the compression procedure by designing a
controller to reduce the parameter updating overhead. Experiments show that
TRACE achieves an overall $\sim$3x speedup while keeps a comparable compression
ratio to the state-of-the-art compressors. The source code for TRACE and links
to the datasets are available at
https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor.
Related papers
- What Operations can be Performed Directly on Compressed Arrays, and with What Error? [1.3307486544794784]
We develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data.
We evaluate it on three non-trivial applications, choosing different number systems for internal representation.
arXiv Detail & Related papers (2024-06-17T05:01:09Z) - A Survey on Transformer Compression [84.18094368700379]
Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV)
Model compression methods reduce the memory and computational cost of Transformer.
This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models.
arXiv Detail & Related papers (2024-02-05T12:16:28Z) - Variator: Accelerating Pre-trained Models with Plug-and-Play Compression
Modules [111.98205411431402]
Variator is a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins.
We show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%.
arXiv Detail & Related papers (2023-10-24T11:00:07Z) - Ultra Dual-Path Compression For Joint Echo Cancellation And Noise
Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement.
Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z) - DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z) - GraVAC: Adaptive Compression for Communication-Efficient Distributed DL
Training [0.0]
Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model.
GraVAC is a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing information loss associated with compression.
As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32x, 1.95x and 6.67x respectively.
arXiv Detail & Related papers (2023-05-20T14:25:17Z) - Compressing Transformer-based self-supervised models for speech
processing [45.254624876127124]
We study several commonly used compression techniques, including weight pruning, head pruning, low-rank approximation, and knowledge distillation.
We report trade-off at various compression rate, including wall-clock time, the number of parameters, and the number of multiply-accumulate operations.
Our results lead to a simple combination of compression techniques that improves trade-off over recent approaches.
arXiv Detail & Related papers (2022-11-17T23:53:52Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - iFlow: Numerically Invertible Flows for Efficient Lossless Compression
via a Uniform Coder [38.297114268193]
iFlow is a new method for achieving efficient lossless compression.
iFlow achieves state-of-the-art compression ratios and is $5times$ quicker than other high-performance schemes.
arXiv Detail & Related papers (2021-11-01T14:15:58Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.