A Fast Transformer-based General-Purpose Lossless Compressor
- URL: http://arxiv.org/abs/2203.16114v2
- Date: Fri, 1 Apr 2022 14:41:36 GMT
- Title: A Fast Transformer-based General-Purpose Lossless Compressor
- Authors: Yu Mao, Yufei Cui, Tei-Wei Kuo, Chun Jason Xue
- Abstract summary: We introduce transformer into deep learning compressors to build history-dependencies in parallel.
Existing transformer is too heavy in computation and incompatible to compression tasks.
Byte-grouping and Shared-ffn schemes are proposed to fully utilize the capacity of the single-layer transformer.
- Score: 19.5544227045828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep-learning-based compressor has received interests recently due to much
improved compression ratio. However, modern approaches suffer from long
execution time. To ease this problem, this paper targets on cutting down the
execution time of deep-learning-based compressors. Building
history-dependencies sequentially (e.g., recurrent neural networks) is
responsible for long inference latency. Instead, we introduce transformer into
deep learning compressors to build history-dependencies in parallel. However,
existing transformer is too heavy in computation and incompatible to
compression tasks.
This paper proposes a fast general-purpose lossless compressor, TRACE, by
designing a compression-friendly structure based on a single-layer transformer.
We first design a new metric to advise the selection part of compression model
structures. Byte-grouping and Shared-ffn schemes are further proposed to fully
utilize the capacity of the single-layer transformer. These features allow
TRACE to achieve competitive compression ratio and a much faster speed. In
addition, we further accelerate the compression procedure by designing a
controller to reduce the parameter updating overhead. Experiments show that
TRACE achieves an overall $\sim$3x speedup while keeps a comparable compression
ratio to the state-of-the-art compressors. The source code for TRACE and links
to the datasets are available at
https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor.
Related papers
- Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass.
FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z) - LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - HyCoT: A Transformer-Based Autoencoder for Hyperspectral Image Compression [6.0163252984457145]
Hyperspectral Compression Transformer (HyCoT) is a transformer-based autoencoder for pixelwise HSI compression.
Experimental results on the HySpecNet-11k dataset demonstrate that HyCoT surpasses the state of the art across various compression ratios by over 1 dB of PSNR.
arXiv Detail & Related papers (2024-08-16T12:27:46Z) - Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [63.43972993473501]
Token compression expedites the training and inference of Vision Transformers (ViTs)
However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages.
We propose a model arithmetic framework to decouple the compression degrees between the two stages.
arXiv Detail & Related papers (2024-08-13T10:36:43Z) - What Operations can be Performed Directly on Compressed Arrays, and with What Error? [1.3307486544794784]
We develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data.
We evaluate it on three non-trivial applications, choosing different number systems for internal representation.
arXiv Detail & Related papers (2024-06-17T05:01:09Z) - Variator: Accelerating Pre-trained Models with Plug-and-Play Compression
Modules [111.98205411431402]
Variator is a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins.
We show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%.
arXiv Detail & Related papers (2023-10-24T11:00:07Z) - Ultra Dual-Path Compression For Joint Echo Cancellation And Noise
Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement.
Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z) - DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z) - Compressing Transformer-based self-supervised models for speech
processing [45.254624876127124]
We study several commonly used compression techniques, including weight pruning, head pruning, low-rank approximation, and knowledge distillation.
We report trade-off at various compression rate, including wall-clock time, the number of parameters, and the number of multiply-accumulate operations.
Our results lead to a simple combination of compression techniques that improves trade-off over recent approaches.
arXiv Detail & Related papers (2022-11-17T23:53:52Z) - iFlow: Numerically Invertible Flows for Efficient Lossless Compression
via a Uniform Coder [38.297114268193]
iFlow is a new method for achieving efficient lossless compression.
iFlow achieves state-of-the-art compression ratios and is $5times$ quicker than other high-performance schemes.
arXiv Detail & Related papers (2021-11-01T14:15:58Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.