Related papers: Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

URL: http://arxiv.org/abs/2402.16058v1
Date: Sun, 25 Feb 2024 11:07:08 GMT
Title: Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression
Authors: Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu
Abstract summary: Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions. We propose a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens.
Score: 39.233017243612025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions, a process that incurs extra costs during inference. In this paper, we propose the Gist COnditioned deCOding (Gist-COCO) model, introducing a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens. It finetunes the compression plugin module and uses the representations of gist tokens to emulate the raw prompts in the vanilla language model. By verbalizing the representations of gist tokens into gist prompts, the compression ability of Gist-COCO can be generalized to different LLMs with high compression rates. Our experiments demonstrate that Gist-COCO outperforms previous prompt compression models in both passage and instruction compression tasks. Further analysis on gist verbalization results suggests that our gist prompts serve different functions in aiding language models. They may directly provide potential answers, generate the chain-of-thought, or simply repeat the inputs. All data and codes are available at https://github.com/OpenMatch/Gist-COCO .

Related papers

LightThinker: Thinking Step-by-Step Compression [53.8069487638972]
We propose LightThinker, a method that enables large language models to dynamically compress intermediate thoughts during reasoning. Inspired by human cognitive processes, LightThinker compresses thought steps into compact representations and discards the original reasoning chains. Experiments show that LightThinker reduces peak memory usage and inference time, while maintaining competitive accuracy.
arXiv Detail & Related papers (2025-02-21T16:57:22Z)
CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs [6.936336826531964]
Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. Existing prompt compression techniques focus on natural language, lacking tailored solutions for code. We propose CodePromptZip, a framework that compresses code examples before integrating into RAG.
arXiv Detail & Related papers (2025-02-19T23:15:23Z)
Better Prompt Compression Without Multi-Layer Perceptrons [33.53334153279698]
We show that the encoder does not need to keep the original language model's architecture to achieve useful compression. We introduce a prompt compression encoder after removing the multilayer perceptron (MLP) layers in the Transformer blocks of a language model.
arXiv Detail & Related papers (2025-01-12T06:57:06Z)
Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles [49.65811277223873]
Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training. Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning. Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
arXiv Detail & Related papers (2024-10-17T21:35:49Z)
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering [23.169961738978614]
We propose AdaCoder, an adaptive prompt compression framework for visual question answering models. AdaCoder operates in two phases: a compression phase and an inference phase. We demonstrate that it reduces token length by 71.1%, while maintaining or even improving the performance of visual question answering.
arXiv Detail & Related papers (2024-07-28T06:23:06Z)
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models [21.025001473355996]
We formalize the problem of prompt compression for large language models (LLMs) We present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy.
arXiv Detail & Related papers (2024-07-22T09:40:13Z)
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass [72.07642648108849]
Superposed Decoding is a new decoding algorithm that generates $k$ drafts at the cost of one autoregressive inference pass. Superposed Decoding can be combined with other decoding strategies, resulting in universal coverage gains when scaling inference time compute.
arXiv Detail & Related papers (2024-05-28T17:40:48Z)
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models [22.06402870816756]
Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. This paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity. We show that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss.
arXiv Detail & Related papers (2023-10-09T14:10:21Z)
Improving Zero-Shot Generalization for CLIP with Synthesized Prompts [135.4317555866831]
Most existing methods require labeled data for all classes, which may not hold in real-world applications. We propose a plug-and-play generative approach called textbfSynttextbfHestextbfIzed textbfPrompts(textbfSHIP) to improve existing fine-tuning methods.
arXiv Detail & Related papers (2023-07-14T15:15:45Z)
Learning to Compress Prompts with Gist Tokens [16.64173373856]
We present gisting, which trains an LM to compress prompts into smaller sets of "gist" tokens. On decoder (LLaMA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts.
arXiv Detail & Related papers (2023-04-17T17:47:37Z)
Verified Reversible Programming for Verified Lossless Compression [11.020543186794459]
Lossless compression implementations typically contain two programs, an encoder and a decoder, which are required to be inverse to one another. We observe that a significant class of compression methods, based on asymmetric numeral systems (ANS), have shared structure between the encoder and decoder. We have implemented a small reversible language, embedded in Agda, which we call 'Flipper'
arXiv Detail & Related papers (2022-11-02T16:39:41Z)
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences. COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences. Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once) The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.