Related papers: A Fast Transformer-based General-Purpose Lossless Compressor

A Fast Transformer-based General-Purpose Lossless Compressor

URL: http://arxiv.org/abs/2203.16114v2
Date: Fri, 1 Apr 2022 14:41:36 GMT
Title: A Fast Transformer-based General-Purpose Lossless Compressor
Authors: Yu Mao, Yufei Cui, Tei-Wei Kuo, Chun Jason Xue
Abstract summary: We introduce transformer into deep learning compressors to build history-dependencies in parallel. Existing transformer is too heavy in computation and incompatible to compression tasks. Byte-grouping and Shared-ffn schemes are proposed to fully utilize the capacity of the single-layer transformer.
Score: 19.5544227045828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep-learning-based compressor has received interests recently due to much improved compression ratio. However, modern approaches suffer from long execution time. To ease this problem, this paper targets on cutting down the execution time of deep-learning-based compressors. Building history-dependencies sequentially (e.g., recurrent neural networks) is responsible for long inference latency. Instead, we introduce transformer into deep learning compressors to build history-dependencies in parallel. However, existing transformer is too heavy in computation and incompatible to compression tasks. This paper proposes a fast general-purpose lossless compressor, TRACE, by designing a compression-friendly structure based on a single-layer transformer. We first design a new metric to advise the selection part of compression model structures. Byte-grouping and Shared-ffn schemes are further proposed to fully utilize the capacity of the single-layer transformer. These features allow TRACE to achieve competitive compression ratio and a much faster speed. In addition, we further accelerate the compression procedure by designing a controller to reduce the parameter updating overhead. Experiments show that TRACE achieves an overall $\sim$3x speedup while keeps a comparable compression ratio to the state-of-the-art compressors. The source code for TRACE and links to the datasets are available at https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor.

Related papers

MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores [5.893964327109089]
MOOSComp is a token-classification-based long-context compression method. We introduce outlier scores to preserve rare but critical tokens that are prone to be discarded in task-agnostic compression. Our method obtains a speedup of 3.3x at a 4x compression ratio on a resource-constrained mobile device.
arXiv Detail & Related papers (2025-04-23T15:02:53Z)
Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models. We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression [23.179381396167084]
We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC) RWKV models achieve the fastest decoding speed with a moderate compression ratio. We propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens.
arXiv Detail & Related papers (2024-12-21T14:24:32Z)
Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass. FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z)
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs) Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time. We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining. Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z)
HyCoT: A Transformer-Based Autoencoder for Hyperspectral Image Compression [6.0163252984457145]
Hyperspectral Compression Transformer (HyCoT) is a transformer-based autoencoder for pixelwise HSI compression. Experimental results on the HySpecNet-11k dataset demonstrate that HyCoT surpasses the state of the art across various compression ratios by over 1 dB of PSNR.
arXiv Detail & Related papers (2024-08-16T12:27:46Z)
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [63.43972993473501]
Token compression expedites the training and inference of Vision Transformers (ViTs) However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages. We propose a model arithmetic framework to decouple the compression degrees between the two stages.
arXiv Detail & Related papers (2024-08-13T10:36:43Z)
What Operations can be Performed Directly on Compressed Arrays, and with What Error? [1.3307486544794784]
We develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data. We evaluate it on three non-trivial applications, choosing different number systems for internal representation.
arXiv Detail & Related papers (2024-06-17T05:01:09Z)
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules [111.98205411431402]
Variator is a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. We show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%.
arXiv Detail & Related papers (2023-10-24T11:00:07Z)
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement. Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z)
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens. DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z)
Compressing Transformer-based self-supervised models for speech processing [45.254624876127124]
We study several commonly used compression techniques, including weight pruning, head pruning, low-rank approximation, and knowledge distillation. We report trade-off at various compression rate, including wall-clock time, the number of parameters, and the number of multiply-accumulate operations. Our results lead to a simple combination of compression techniques that improves trade-off over recent approaches.
arXiv Detail & Related papers (2022-11-17T23:53:52Z)
iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder [38.297114268193]
iFlow is a new method for achieving efficient lossless compression. iFlow achieves state-of-the-art compression ratios and is $5times$ quicker than other high-performance schemes.
arXiv Detail & Related papers (2021-11-01T14:15:58Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.