Test-Time Steering for Lossless Text Compression via Weighted Product of Experts
- URL: http://arxiv.org/abs/2511.10660v1
- Date: Tue, 04 Nov 2025 16:37:56 GMT
- Title: Test-Time Steering for Lossless Text Compression via Weighted Product of Experts
- Authors: Qihang Zhang, Muchen Li, Ziao Wang, Renjie Liao, Lele Wang,
- Abstract summary: We propose a novel framework that performs Test-Time Steering via a Weighted Product of Experts (wPoE)<n>At inference, our method adaptively combines a universal compression model with a pretrained neural language model, ensuring the compression rate is at least as good as that of the best individual model.<n>It seamlessly integrates with any autoregressive language model, providing a practical solution for enhancing text compression across diverse data distributions.
- Score: 27.679089540901007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lossless compression techniques are crucial in an era of rapidly growing data. Traditional universal compressors like gzip offer low computational overhead, high speed, and broad applicability across data distributions. However, they often lead to worse compression rates than modern neural compressors, which leverage large-scale training data to model data distributions more effectively. Despite their advantages, neural compressors struggle to generalize to unseen data. To address this limitation, we propose a novel framework that performs Test-Time Steering via a Weighted Product of Experts (wPoE). At inference, our method adaptively combines a universal compression model with a pretrained neural language model, ensuring the compression rate is at least as good as that of the best individual model. Extensive experiments demonstrate that our approach improves the performance of text compression without requiring fine-tuning. Furthermore, it seamlessly integrates with any autoregressive language model, providing a practical solution for enhancing text compression across diverse data distributions.
Related papers
- Seq2Seq2Seq: Lossless Data Compression via Discrete Latent Transformers and Reinforcement Learning [3.2641459166493405]
We propose a novel compression method based on Reinforcement Learning applied to a T5 language model architecture.<n>This approach enables the compression of data into sequences of tokens rather than traditional vector representations.<n>By leveraging the latent information within language models, our system effectively compresses data without requiring explicit content understanding.
arXiv Detail & Related papers (2026-02-12T16:30:55Z) - Arbitrary Ratio Feature Compression via Next Token Prediction [52.10426317889982]
Arbitrary Ratio Feature Compression (ARFC) framework supports any compression ratio with a single model.<n>ARC is an auto-regressive model that performs compression via next-gressive prediction.<n>MoS module refines the compressed tokens by utilizing multiple compression results.<n>ERGC is integrated into the training process to preserve semantic and structural relationships during compression.
arXiv Detail & Related papers (2026-02-12T02:38:57Z) - Proxy Compression for Language Modeling [58.904023114033954]
proxy compression is an alternative training scheme that preserves the efficiency benefits of compressed inputs.<n>Experiments on code language modeling demonstrate that proxy compression substantially improves training efficiency.<n>As model scale increases, proxy-trained models eventually match or rival tokenizer approaches.
arXiv Detail & Related papers (2026-02-04T07:36:46Z) - Simple Context Compression: Mean-Pooling and Multi-Ratio Training [12.049015994907629]
We develop a lightweight and simple mean-pooling approach that consistently outperforms the widely used compression-tokens architecture.<n>We conduct extensive experiments across in-domain and out-of-domain QA datasets, as well as across model families, scales, and compression ratios.<n>Overall, our simple mean-pooling approach achieves the strongest performance, with a relatively small drop when training for multiple compression ratios.
arXiv Detail & Related papers (2025-10-23T17:57:23Z) - OpenZL: A Graph-Based Model for Compression [1.9508265730898475]
Application-specific compressor systems outperform even the best generic compressors.<n>We show that these challenges can be overcome with a new compression strategy.<n>OpenZL compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder.
arXiv Detail & Related papers (2025-10-03T17:40:29Z) - Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression [57.71917274869577]
UltraDelta is a data-free delta compression pipeline that achieves both ultra-high compression and strong performance.<n>UltraDelta is designed to minimize redundancy, maximize information, and stabilize performance across inter-layer, intra-layer, and global dimensions.
arXiv Detail & Related papers (2025-05-19T10:37:22Z) - L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression [23.179381396167084]
We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC)<n> RWKV models achieve the fastest decoding speed with a moderate compression ratio.<n>We propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens.
arXiv Detail & Related papers (2024-12-21T14:24:32Z) - AlphaZip: Neural Network-Enhanced Lossless Text Compression [0.0]
This paper introduces a lossless text compression approach using a Large Language Model (LLM)
The method involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip.
arXiv Detail & Related papers (2024-09-23T14:21:06Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.