Related papers: From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression

From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression

URL: http://arxiv.org/abs/2410.04139v1
Date: Sat, 5 Oct 2024 12:27:47 GMT
Title: From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression
Authors: Eunseong Choi, Sunkyung Lee, Minjin Choi, June Park, Jongwuk Lee,
Abstract summary: Large language models (LLMs) have achieved significant performance gains using advanced prompting techniques. Prompt compression has been proposed to alleviate these issues, but it faces challenges in capturing the global context and training the compressor effectively.
Score: 9.5823848981136
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) have achieved significant performance gains using advanced prompting techniques over various tasks. However, the increasing length of prompts leads to high computational costs and often obscures crucial information. Prompt compression has been proposed to alleviate these issues, but it faces challenges in (i) capturing the global context and (ii) training the compressor effectively. To tackle these challenges, we introduce a novel prompt compression method, namely Reading To Compressing (R2C), utilizing the Fusion-in-Decoder (FiD) architecture to identify the important information in the prompt. Specifically, the cross-attention scores of the FiD are used to discern essential chunks and sentences from the prompt. R2C effectively captures the global context without compromising semantic consistency while detouring the necessity of pseudo-labels for training the compressor. Empirical results show that R2C retains key contexts, enhancing the LLM performance by 6% in out-of-domain evaluations while reducing the prompt length by 80%.

Related papers

Understanding and Improving Information Preservation in Prompt Compression for LLMs [10.912320980464571]
In information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information. We propose a holistic evaluation framework that allows for in-depth analysis of prompt compression methods.
arXiv Detail & Related papers (2025-03-24T20:06:11Z)
ICPC: In-context Prompt Compression with Faster Inference [0.0]
We propose I CPC (In-context Prompt Compression), a novel and scalable prompt compression method that adaptively reduces the prompt length. The key idea of I CPC is to calculate the probability of each word appearing in the prompt using encoders and calculate information carried by each word through the information function. Empirically, we demonstrate that I CPC can effectively compress long texts of different categories and thus achieve better performance and speed on different types of NLP tasks.
arXiv Detail & Related papers (2025-01-03T03:46:51Z)
Efficient Long Context Language Model Retrieval with Compression [57.09163579304332]
Long Context Language Models (LCLMs) have emerged as a new paradigm to perform Information Retrieval (IR) We propose a new compression approach tailored for LCLM retrieval, which is trained to maximize the retrieval performance while minimizing the length of the compressed passages. We show that CoLoR improves the retrieval performance by 6% while compressing the in-context size by a factor of 1.91.
arXiv Detail & Related papers (2024-12-24T07:30:55Z)
Squeezed Attention: Accelerating Long Context Length LLM Inference [64.11145320159126]
We propose Squeezed Attention as a mechanism to accelerate LLM applications where a large portion of the input prompt is fixed. We use K-means clustering offline to group the keys for the fixed context based on semantic similarity and represent each cluster with a single centroid value. We then compute exact attention using only these important keys from the fixed context, thereby reducing bandwidth and computational costs.
arXiv Detail & Related papers (2024-11-14T18:54:19Z)
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
BRIEF (Bridging Retrieval and Inference through Evidence Fusion) is a lightweight approach that performs query-aware multi-hop reasoning. Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z)
Perception Compressor:A training-free prompt compression method in long context scenarios [17.720102137585503]
Perception is a training-free prompt compression method for large language models. It outperforms existing methods by a large margin, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-09-28T07:13:33Z)
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning [11.167198972934736]
Large language models (LLMs) such as GPT-4 have led to a surge in the size of prompts required for optimal performance. We propose a novel and efficient reinforcement learning (RL) based task-aware prompt compression method. We demonstrate that our RL-guided compression method improves the task performance by 8% - 260% over state-of-the-art compression techniques.
arXiv Detail & Related papers (2024-09-19T18:11:59Z)
LanguaShrink: Reducing Token Overhead with Psycholinguistics [8.123272461141815]
LanguaShrink is a prompt compression framework for large language models. It reduces prompt length while preserving essential information. Compared to existing prompt compression methods, LanguaShrink improves end-to-end latency by 1.43 times.
arXiv Detail & Related papers (2024-09-01T22:09:20Z)
Concise and Precise Context Compression for Tool-Using Language Models [60.606281074373136]
We propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models. Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.
arXiv Detail & Related papers (2024-07-02T08:17:00Z)
In-Context Former: Lightning-fast Compressing Context for Large Language Model [48.831304302467004]
In this paper, we propose a new approach to compress the long input contexts of Transformer-based large language models (LLMs) We use the cross-attention mechanism and a small number of learnable digest tokens to condense information from the contextual word embeddings. Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times.
arXiv Detail & Related papers (2024-06-19T15:14:55Z)
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression [43.048684907893104]
This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. We formulate prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one. Our approach leads to lower latency by explicitly learning the compression objective with smaller models such as XLM-RoBERTa-large and mBERT.
arXiv Detail & Related papers (2024-03-19T17:59:56Z)
Long Context Compression with Activation Beacon [22.054232261437186]
Activation Beacon is a plug-in module for transformer-based LLMs. It targets effective, efficient, and flexible compression of long contexts. It achieves a 2x acceleration in inference time and an 8x reduction of memory costs for KV cache.
arXiv Detail & Related papers (2024-01-07T11:57:40Z)
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation [61.53695868960846]
We propose compressing retrieved documents into textual summaries prior to in-context integration. This not only reduces the computational costs but also relieves the burden of LMs to identify relevant information in long retrieved documents. We show that our compressors trained for one LM can transfer to other LMs on the language modeling task and provide summaries largely faithful to the retrieved documents.
arXiv Detail & Related papers (2023-10-06T17:55:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.