QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression
- URL: http://arxiv.org/abs/2408.00274v1
- Date: Thu, 1 Aug 2024 04:28:38 GMT
- Title: QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression
- Authors: Wenshan Wang, Yihang Wang, Yixing Fan, Huaming Liao, Jiafeng Guo,
- Abstract summary: In this paper, we introduce a novel Query-gUIded aTtention cOmpression (QUITO) method to filter useless information.
Specifically, we take a trigger token to calculate the attention distribution of the context in response to the question.
We evaluate the QUITO using two widely-used datasets, namely, NaturalQuestions and ASQA.
- Score: 37.08536175557748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) capabilities are foundational to the success of large language models (LLMs). Recently, context compression has attracted growing interest since it can largely reduce reasoning complexities and computation costs of LLMs. In this paper, we introduce a novel Query-gUIded aTtention cOmpression (QUITO) method, which leverages attention of the question over the contexts to filter useless information. Specifically, we take a trigger token to calculate the attention distribution of the context in response to the question. Based on the distribution, we propose three different filtering methods to satisfy the budget constraints of the context length. We evaluate the QUITO using two widely-used datasets, namely, NaturalQuestions and ASQA. Experimental results demonstrate that QUITO significantly outperforms established baselines across various datasets and downstream LLMs, underscoring its effectiveness. Our code is available at https://github.com/Wenshansilvia/attention_compressor.
Related papers
- LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs [10.84210988032097]
We introduce Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model's length limit.
We also introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content.
arXiv Detail & Related papers (2025-02-10T04:02:18Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? [36.83397306207386]
We evaluate the capabilities of 17 leading Large Language Models (LLMs)
Strikingly, many models are remarkably threadsafe: capable of simultaneously following multiple threads without significant loss in performance.
We find the effective context limit is significantly shorter than the supported context length, with accuracy decreasing as the context window grows.
arXiv Detail & Related papers (2024-11-07T18:59:27Z) - What is Wrong with Perplexity for Long-context Language Modeling? [71.34933096461124]
Long-context inputs are crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning.
Perplexity (PPL) has proven unreliable for assessing long-context capabilities.
We propose bfLongPPL, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them.
arXiv Detail & Related papers (2024-10-31T09:39:28Z) - Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference [16.830389144259584]
We propose context-aware prompt compression (CPC), a sentence-level prompt compression technique.
Key innovation is a novel context-aware sentence encoder that provides a relevance score for each sentence for a given question.
Our method considerably outperforms prior works on prompt compression on benchmark datasets.
arXiv Detail & Related papers (2024-09-02T13:02:51Z) - QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory [66.01597794579568]
We introduce information bottleneck theory (IB) to model the problem.
We propose a cross-attention-based approach to approximate mutual information in IB.
Our method achieves a 25% increase in compression rate compared to the state-of-the-art.
arXiv Detail & Related papers (2024-08-20T02:44:45Z) - LLoCO: Learning Long Contexts Offline [63.3458260335454]
We propose LLoCO, a novel approach to processing long contexts.
LLoCO learns contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA.
Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens.
arXiv Detail & Related papers (2024-04-11T17:57:22Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of
LLMs with Self-Information-Based Content Filtering [4.1372815372396525]
This paper proposes a method called textitSelective Context that employs self-information to filter out less informative content.
We demonstrate the effectiveness of our approach on tasks of summarisation and question answering across different data sources.
arXiv Detail & Related papers (2023-04-24T13:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.