QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention
- URL: http://arxiv.org/abs/2408.10497v1
- Date: Tue, 20 Aug 2024 02:44:45 GMT
- Title: QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention
- Authors: Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo,
- Abstract summary: We introduce information bottleneck theory to examine the properties required by the metric.
Inspired by this, we use cross-attention in encoder-decoder architecture as a new metric.
Our simple method leads to significantly better performance in smaller models with lower latency.
- Score: 37.25151458038128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL. However, with tasks becoming increasingly complex, the context length required by ICL is also getting longer, and two significant issues arise: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amount of task-irrelevant information introduced by long contexts exacerbates the "lost in the middle" problem. Recently, compressing prompts by removing tokens according to some metric obtained from some causal language models, such as llama-7b, has emerged as an effective approach to mitigate these issues. However, the metric used by prior method such as self-information or PPL do not fully align with the objective of distinuishing the most important tokens when conditioning on query. In this work, we introduce information bottleneck theory to carefully examine the properties required by the metric. Inspired by this, we use cross-attention in encoder-decoder architecture as a new metric. Our simple method leads to significantly better performance in smaller models with lower latency. We evaluate our method on four datasets: DROP, CoQA, SQuAD, and Quoref. The experimental results show that, while maintaining the same performance, our compression rate can improve by nearly 25% over previous SOTA. Remarkably, in experiments where 25% of the tokens are removed, our model's EM score for answers sometimes even exceeds that of the control group using uncompressed text as context.
Related papers
- Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information.
During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments.
We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z) - Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability [67.77534983324229]
In this paper, we investigate the ability of Large Language Models to develop a unified compression method that discretizes uninformative tokens.
Experiments show Selection-p achieves state-of-the-art performance across numerous classification tasks.
It exhibits superior transferability to different models compared to prior work.
arXiv Detail & Related papers (2024-10-15T17:05:25Z) - QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression [37.08536175557748]
In this paper, we introduce a novel Query-gUIded aTtention cOmpression (QUITO) method to filter useless information.
Specifically, we take a trigger token to calculate the attention distribution of the context in response to the question.
We evaluate the QUITO using two widely-used datasets, namely, NaturalQuestions and ASQA.
arXiv Detail & Related papers (2024-08-01T04:28:38Z) - CompAct: Compressing Retrieved Documents Actively for Question Answering [15.585833125854418]
CompAct is a novel framework that employs an active strategy to condense extensive documents without losing key information.
Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering benchmarks.
arXiv Detail & Related papers (2024-07-12T06:06:54Z) - KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - Thread of Thought Unraveling Chaotic Contexts [133.24935874034782]
"Thread of Thought" (ThoT) strategy draws inspiration from human cognitive processes.
In experiments, ThoT significantly improves reasoning performance compared to other prompting techniques.
arXiv Detail & Related papers (2023-11-15T06:54:44Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - From Contextual Data to Newsvendor Decisions: On the Actual Performance
of Data-Driven Algorithms [2.9603743540540357]
We study how the relevance and quantity of past data affects the performance of a data-driven policy.
We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
arXiv Detail & Related papers (2023-02-16T17:03:39Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.