Related papers: L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression

L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression

URL: http://arxiv.org/abs/2412.16642v2
Date: Tue, 24 Dec 2024 04:20:18 GMT
Title: L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression
Authors: Junxuan Zhang, Zhengxue Cheng, Yan Zhao, Shihao Wang, Dajiang Zhou, Guo Lu, Li Song,
Abstract summary: We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC)<n> RWKV models achieve the fastest decoding speed with a moderate compression ratio.<n>We propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens.
Score: 23.179381396167084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning-based probabilistic models can be combined with an entropy coder for data compression. However, due to the high complexity of learning-based models, their practical application as text compressors has been largely overlooked. To address this issue, our work focuses on a low-complexity design while maintaining compression performance. We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC). Specifically, we conduct extensive experiments demonstrating that RWKV models achieve the fastest decoding speed with a moderate compression ratio, making it the most suitable backbone for our method. Second, we propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens while allowing outliers to bypass the prediction and encoding. Third, we propose a novel high-rank reparameterization strategy that enhances the learning capability during training without increasing complexity during inference. Experimental results validate that our method achieves 48% bit saving compared to gzip compressor. Besides, L3TC offers compression performance comparable to other learned compressors, with a 50x reduction in model parameters. More importantly, L3TC is the fastest among all learned compressors, providing real-time decoding speeds up to megabytes per second. Our code is available at https://github.com/alipay/L3TC-leveraging-rwkv-for-learned-lossless-low-complexity-text-compression. git.

Related papers

Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass. FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z)
HyCoT: A Transformer-Based Autoencoder for Hyperspectral Image Compression [6.0163252984457145]
Hyperspectral Compression Transformer (HyCoT) is a transformer-based autoencoder for pixelwise HSI compression. Experimental results on the HySpecNet-11k dataset demonstrate that HyCoT surpasses the state of the art across various compression ratios by over 1 dB of PSNR.
arXiv Detail & Related papers (2024-08-16T12:27:46Z)
Training LLMs over Neurally Compressed Text [55.11828645767342]
This paper explores the idea of training large language models (LLMs) over highly compressed text.<n>We propose Equal-Info Windows, a novel compression technique whereby text is segmented into blocks that each compress to the same bit length.<n>We demonstrate effective learning over neurally compressed text that improves with scale, and outperforms byte-level baselines by a wide margin on perplexity and inference speed benchmarks.
arXiv Detail & Related papers (2024-04-04T17:48:28Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Long Context Compression with Activation Beacon [22.054232261437186]
Activation Beacon is a plug-in module for transformer-based LLMs. It targets effective, efficient, and flexible compression of long contexts. It achieves a 2x acceleration in inference time and an 8x reduction of memory costs for KV cache.
arXiv Detail & Related papers (2024-01-07T11:57:40Z)
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement. Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z)
LeCo: Lightweight Compression via Learning Serial Correlations [9.108815508920882]
Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. We propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically. We observe up to 5.2x speed up in a data analytical query in the Arrow columnar execution engine and a 16% increase in RocksDB's throughput.
arXiv Detail & Related papers (2023-06-27T10:46:36Z)
Context-Based Trit-Plane Coding for Progressive Image Compression [31.396712329965005]
Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. We develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately. Second, we develop the context-based distortion reduction module to refine partial latent tensors from the trit-planes. Third, we propose a retraining scheme for the decoder to attain better rate-distortion tradeoffs.
arXiv Detail & Related papers (2023-03-10T05:46:25Z)
Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging. We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z)
Deep Lossy Plus Residual Coding for Lossless and Near-lossless Image Compression [85.93207826513192]
We propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. We solve the joint lossy and residual compression problem in the approach of VAEs. In the near-lossless mode, we quantize the original residuals to satisfy a given $ell_infty$ error bound.
arXiv Detail & Related papers (2022-09-11T12:11:56Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Optimal Gradient Compression for Distributed and Federated Learning [9.711326718689492]
Communication between computing nodes in distributed learning is typically an unavoidable burden. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error.
arXiv Detail & Related papers (2020-10-07T07:58:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.