Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs
- URL: http://arxiv.org/abs/2406.02376v2
- Date: Mon, 17 Jun 2024 15:02:11 GMT
- Title: Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs
- Authors: Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, Jinsong Su,
- Abstract summary: Performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level.
We introduce Query-Guided (QGC) which leverages queries to guide the context compression process.
We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets.
- Score: 35.91962517513945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets. Experimental results show that QGC can consistently perform well even at high compression ratios, which also offers significant benefits in terms of inference cost and throughput.
Related papers
- Task-agnostic Prompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor [16.830389144259584]
Task-agnostic Prompt Compression (TPC) is a novel framework that generalizes compression across tasks and domains without requiring input questions or templates.
TPC generates a context-relevant task description using a task descriptor trained on a curated dataset of context and query pairs.
We introduce 3 model sizes (Base, Large, and Huge), where the largest model outperforms the existing state-of-the-art methods on LongBench and ZeroSCROLLS benchmarks.
arXiv Detail & Related papers (2025-02-19T02:16:29Z) - Learned Data Compression: Challenges and Opportunities for the Future [34.95766887424342]
Recent advances in emphlearned have inspired the development of emphlearned compressors
These compressors leverage simple yet compact machine learning (ML) models to compress large-scale sorted keys.
This vision paper explores the potential of learned data compression to enhance critical areas in indexes and related domains.
arXiv Detail & Related papers (2024-12-14T09:47:21Z) - Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios [17.720102137585503]
Perception is a training-free prompt compression framework for large language models.
It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations.
We conduct extensive experiments on long context, benchmarks, iSie, LongBench, and MuSiQue.
arXiv Detail & Related papers (2024-09-28T07:13:33Z) - Concise and Precise Context Compression for Tool-Using Language Models [60.606281074373136]
We propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models.
Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.
arXiv Detail & Related papers (2024-07-02T08:17:00Z) - Ranking LLMs by compression [13.801767671391604]
We use five large language models as priors for compression, then compare their performance on challenging natural language processing tasks.
Experimental results show that compression ratio and model performance are positively correlated, so it can be used as a general metric to evaluate large language models.
arXiv Detail & Related papers (2024-06-20T10:23:38Z) - In-Context Former: Lightning-fast Compressing Context for Large Language Model [48.831304302467004]
In this paper, we propose a new approach to compress the long input contexts of Transformer-based large language models (LLMs)
We use the cross-attention mechanism and a small number of learnable digest tokens to condense information from the contextual word embeddings.
Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times.
arXiv Detail & Related papers (2024-06-19T15:14:55Z) - The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models [11.156816338995503]
Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment.
Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits.
Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy.
More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
arXiv Detail & Related papers (2023-12-01T22:27:12Z) - Cross Modal Compression: Towards Human-comprehensible Semantic
Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data.
We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z) - What do Compressed Large Language Models Forget? Robustness Challenges
in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning.
We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets.
We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.