Related papers: Compressing Lengthy Context With UltraGist

Compressing Lengthy Context With UltraGist

URL: http://arxiv.org/abs/2405.16635v2
Date: Fri, 11 Oct 2024 02:08:38 GMT
Title: Compressing Lengthy Context With UltraGist
Authors: Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou,
Abstract summary: We propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context. UltraGist contributes to the flexibility of compression, as it can be effectively learned to support a broad range of context lengths and compression ratios. It makes the training process sample-efficient and thus maximizes the use of training data.
Score: 22.054232261437186
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compressing lengthy context is a critical but technically challenging problem. In this paper, we propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context due to the innovative design of the compression and learning algorithm. UltraGist brings forth the following important benefits. Firstly, it notably contributes to the flexibility of compression, as it can be effectively learned to support a broad range of context lengths and compression ratios. Secondly, it helps to produce fine-grained compression for the lengthy context, where each small segment of the context is progressively processed on top of a tailored cross-attention mechanism. Thirdly, it makes the training process sample-efficient and thus maximizes the use of training data. Finally, it facilitates the efficient running of compression for dynamic context, as the compression result can be progressively generated and hence incrementally updated. UltraGist is evaluated on a wide variety of tasks associated with lengthy context, such as document QA and summarization, few-shot learning, multi-session conversation, et al. Whilst the existing methods fail to handle these challenging scenarios, our approach is able to preserve a near-lossless compression performance throughout all the evaluations. Our data, model, and code have been released at \url{https://github.com/namespace-Pt/UltraGist}.

Related papers

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression [41.71994217868039]
We show that gist-based compression can achieve near-lossless performance on tasks like retrieval-augmented generation and long-document QA. We identify three key failure patterns: lost by the boundary, lost if surprise, and lost along the way. We propose two effective strategies: fine-grained autoencoding, which enhances the reconstruction of original token information, and segment-wise token importance estimation.
arXiv Detail & Related papers (2024-12-23T11:24:04Z)
Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles [49.65811277223873]
Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training. Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning. Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
arXiv Detail & Related papers (2024-10-17T21:35:49Z)
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference [16.830389144259584]
We propose context-aware prompt compression (CPC), a sentence-level prompt compression technique. Key innovation is a novel context-aware sentence encoder that provides a relevance score for each sentence for a given question. Our method considerably outperforms prior works on prompt compression on benchmark datasets.
arXiv Detail & Related papers (2024-09-02T13:02:51Z)
LanguaShrink: Reducing Token Overhead with Psycholinguistics [8.123272461141815]
LanguaShrink is a prompt compression framework for large language models. It reduces prompt length while preserving essential information. Compared to existing prompt compression methods, LanguaShrink improves end-to-end latency by 1.43 times.
arXiv Detail & Related papers (2024-09-01T22:09:20Z)
Characterizing Prompt Compression Methods for Long Context Inference [36.9745587176401]
Long context inference presents challenges at the system level with increased compute and memory requirements. Several methods have been proposed to compress the prompt to reduce the context length. We perform a comprehensive characterization and evaluation of different prompt compression methods.
arXiv Detail & Related papers (2024-07-11T23:34:32Z)
Concise and Precise Context Compression for Tool-Using Language Models [60.606281074373136]
We propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models. Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.
arXiv Detail & Related papers (2024-07-02T08:17:00Z)
In-Context Former: Lightning-fast Compressing Context for Large Language Model [48.831304302467004]
In this paper, we propose a new approach to compress the long input contexts of Transformer-based large language models (LLMs) We use the cross-attention mechanism and a small number of learnable digest tokens to condense information from the contextual word embeddings. Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times.
arXiv Detail & Related papers (2024-06-19T15:14:55Z)
Training LLMs over Neurally Compressed Text [55.11828645767342]
This paper explores the idea of training large language models (LLMs) over highly compressed text. We propose Equal-Info Windows, a novel compression technique whereby text is segmented into blocks that each compress to the same bit length. We demonstrate effective learning over neurally compressed text that improves with scale, and outperforms byte-level baselines by a wide margin on perplexity and inference speed benchmarks.
arXiv Detail & Related papers (2024-04-04T17:48:28Z)
Long Context Compression with Activation Beacon [22.054232261437186]
Activation Beacon is a plug-in module for transformer-based LLMs. It targets effective, efficient, and flexible compression of long contexts. It achieves a 2x acceleration in inference time and an 8x reduction of memory costs for KV cache.
arXiv Detail & Related papers (2024-01-07T11:57:40Z)
Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging. We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z)
Cross Modal Compression: Towards Human-comprehensible Semantic Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data. We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z)
Espresso: Revisiting Gradient Compression from the System Perspective [8.535644448611928]
Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL) However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. Espresso is designed to express all compression strategies and the corresponding interactions among tensors of any DDL training job. It can improve the training throughput over the start-of-the-art compression-enabled system by up to 77% for representative DDL training jobs.
arXiv Detail & Related papers (2022-05-28T15:47:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.