Ranking LLMs by compression
- URL: http://arxiv.org/abs/2406.14171v1
- Date: Thu, 20 Jun 2024 10:23:38 GMT
- Title: Ranking LLMs by compression
- Authors: Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang,
- Abstract summary: We use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks.
Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models.
- Score: 13.801767671391604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially the process of learning the optimal coding length. At the same time, the evaluation metric compression ratio can be obtained without actual compression, which greatly saves overhead. In this paper, we use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks, including sentence completion, question answering, and coreference resolution. Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models.
Related papers
- Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models [21.025001473355996]
We formalize the problem of prompt compression for large language models (LLMs)
We present a framework to unify token-level prompt compression methods which create hard prompts for black-box models.
We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy.
arXiv Detail & Related papers (2024-07-22T09:40:13Z) - In-Context Former: Lightning-fast Compressing Context for Large Language Model [48.831304302467004]
In this paper, we propose a new approach to compress the long input contexts of Transformer-based large language models (LLMs)
We use the cross-attention mechanism and a small number of learnable digest tokens to condense information from the contextual word embeddings.
Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times.
arXiv Detail & Related papers (2024-06-19T15:14:55Z) - Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance [34.641079276516926]
We argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling.
We demonstrate the empirical importance of compression for downstream success of pre-trained language models.
We show that there is a correlation between tokenizers' compression and models' downstream performance.
arXiv Detail & Related papers (2024-03-10T17:02:53Z) - A Survey on Transformer Compression [84.18094368700379]
Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV)
Model compression methods reduce the memory and computational cost of Transformer.
This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models.
arXiv Detail & Related papers (2024-02-05T12:16:28Z) - Evaluating Large Language Models for Generalization and Robustness via
Data Compression [19.17779153163157]
We propose a data compression based evaluation approach that tests how models' predictive abilities generalize after their training cutoff.
Specifically, we collect comprehensive test data spanning 83 months from 2017 to 2023 and split the data into training and testing periods according to models' training data cutoff.
Our experiments test 14 representative large language models with various sizes on sources including Wikipedia, news articles, code, arXiv papers, and multi-modal data.
arXiv Detail & Related papers (2024-02-01T18:56:18Z) - The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models [11.156816338995503]
Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment.
Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits.
Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy.
More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
arXiv Detail & Related papers (2023-12-01T22:27:12Z) - Approximating Human-Like Few-shot Learning with GPT-based Compression [55.699707962017975]
We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference.
We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity.
arXiv Detail & Related papers (2023-08-14T05:22:33Z) - Does compressing activations help model parallel training? [64.59298055364336]
We present the first empirical study on the effectiveness of compression methods for model parallelism.
We implement and evaluate three common classes of compression algorithms.
We evaluate these methods across more than 160 settings and 8 popular datasets.
arXiv Detail & Related papers (2023-01-06T18:58:09Z) - What Do Compressed Multilingual Machine Translation Models Forget? [102.50127671423752]
We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases.
We demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.
arXiv Detail & Related papers (2022-05-22T13:54:44Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.