Compression Hacking: A Supplementary Perspective on Informatics Properties of Language Models from Geometric Distortion
- URL: http://arxiv.org/abs/2505.17793v2
- Date: Tue, 15 Jul 2025 10:57:33 GMT
- Title: Compression Hacking: A Supplementary Perspective on Informatics Properties of Language Models from Geometric Distortion
- Authors: Jianxiang Zang, Meiling Ning, Yongda Wei, Shihan Dou, Jiazheng Zhang, Nijia Mo, Binhong Li, Tao Gui, Qi Zhang, Xuanjing Huang,
- Abstract summary: From a geometric standpoint, the word representation space of highly compressed LMs tends to degenerate into a highly anisotropic state.<n>We find this synchronicity is essentially the Compression Hacking'' in LM representations.<n>We propose three refined compression metrics by incorporating geometric distortion analysis and integrate them into a self-evaluation pipeline.
- Score: 36.47982325967706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the concept of ``compression as intelligence'' has provided a novel informatics metric perspective for language models (LMs), emphasizing that highly structured representations signify the intelligence level of LMs. However, from a geometric standpoint, the word representation space of highly compressed LMs tends to degenerate into a highly anisotropic state, which hinders the LM's ability to comprehend instructions and directly impacts its performance. We found this compression-anisotropy synchronicity is essentially the ``Compression Hacking'' in LM representations, where noise-dominated directions tend to create the illusion of high compression rates by sacrificing spatial uniformity. Based on this, we propose three refined compression metrics by incorporating geometric distortion analysis and integrate them into a self-evaluation pipeline. The refined metrics exhibit strong alignment with the LM's comprehensive capabilities, achieving Spearman correlation coefficients above 0.9, significantly outperforming both the original compression and other internal structure-based metrics. This confirms that compression hacking substantially enhances the informatics interpretation of LMs by incorporating geometric distortion of representations.
Related papers
- UniComp: Rethinking Video Compression Through Informational Uniqueness [16.98296446798904]
UniComp aims to maximize the information fidelity of video representations under constrained computational budgets.<n>We introduce the notion of information uniqueness to measure intrinsic redundancy among tokens to link with reconstruction error.
arXiv Detail & Related papers (2025-12-03T08:56:23Z) - Variational Geometric Information Bottleneck: Learning the Shape of Understanding [0.0]
Variational Geometric Information Bottleneck (V-GIB) is a variational estimator that unifies mutual-information compression and curvature regularization.<n>V-GIB provides a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-understandable structure.
arXiv Detail & Related papers (2025-11-04T11:33:54Z) - Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z) - CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation [60.712165339762116]
CompGS++ is a novel framework that leverages compact Gaussian primitives to achieve accurate 3D modeling.<n>Our design is based on the principle of eliminating redundancy both between and within primitives.<n>Our implementation will be made publicly available on GitHub to facilitate further research.
arXiv Detail & Related papers (2025-04-17T15:33:01Z) - Hierarchical Semantic Compression for Consistent Image Semantic Restoration [62.97519327310638]
We propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models.<n> Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision.
arXiv Detail & Related papers (2025-02-24T03:20:44Z) - UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective [85.08718140718707]
UNComp is an uncertainty-aware framework that uncovers sparsity patterns that can be used for adaptive compression.<n>By focusing on uncertainty to analyze the sparsity pattern in detail, UNComp reduces the KV cache size to 4.74% of the original, achieves a 6% prefill speedup, and improves throughput by 6.4x.
arXiv Detail & Related papers (2024-10-04T02:32:36Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression [33.45167213570976]
We investigate the impact of model compression on four dimensions: (1) degeneration harm, i.e., bias and toxicity in generation; (2) representational harm, i.e., biases in discriminative tasks; (3) dialect bias; and(4) language modeling and downstream task performance.
Our analysis reveals that compression can lead to unexpected consequences.
arXiv Detail & Related papers (2024-07-06T05:56:22Z) - Data-free Weight Compress and Denoise for Large Language Models [96.68582094536032]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.<n>We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models [11.156816338995503]
Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment.
Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits.
Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy.
More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
arXiv Detail & Related papers (2023-12-01T22:27:12Z) - White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? [27.58916930770997]
We show a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable.
Experiments show that these networks, despite their simplicity, indeed learn to compress and sparsify representations of large-scale real-world image and text datasets.
arXiv Detail & Related papers (2023-11-22T02:23:32Z) - Bridging Information-Theoretic and Geometric Compression in Language
Models [11.96710733444808]
For a language model to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions.
We show that, in turn, high compression of a linguistic dataset predicts rapid adaptation to that dataset.
As a practical byproduct of our analysis, we evaluate a battery of intrinsic dimension estimators for the first time on linguistic data.
arXiv Detail & Related papers (2023-10-20T16:12:13Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - What Do Compressed Multilingual Machine Translation Models Forget? [102.50127671423752]
We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases.
We demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.
arXiv Detail & Related papers (2022-05-22T13:54:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.