Say More with Less: Understanding Prompt Learning Behaviors through Gist
Compression
- URL: http://arxiv.org/abs/2402.16058v1
- Date: Sun, 25 Feb 2024 11:07:08 GMT
- Title: Say More with Less: Understanding Prompt Learning Behaviors through Gist
Compression
- Authors: Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang,
Ge Yu
- Abstract summary: Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions.
We propose a novel method for compressing prompts which also can assist the prompt interpretation and engineering.
Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens.
- Score: 39.233017243612025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) require lengthy prompts as the input context to
produce output aligned with user intentions, a process that incurs extra costs
during inference. In this paper, we propose the Gist COnditioned deCOding
(Gist-COCO) model, introducing a novel method for compressing prompts which
also can assist the prompt interpretation and engineering. Gist-COCO employs an
encoder-decoder based language model and then incorporates an additional
encoder as a plugin module to compress prompts with inputs using gist tokens.
It finetunes the compression plugin module and uses the representations of gist
tokens to emulate the raw prompts in the vanilla language model. By verbalizing
the representations of gist tokens into gist prompts, the compression ability
of Gist-COCO can be generalized to different LLMs with high compression rates.
Our experiments demonstrate that Gist-COCO outperforms previous prompt
compression models in both passage and instruction compression tasks. Further
analysis on gist verbalization results suggests that our gist prompts serve
different functions in aiding language models. They may directly provide
potential answers, generate the chain-of-thought, or simply repeat the inputs.
All data and codes are available at https://github.com/OpenMatch/Gist-COCO .
Related papers
- Better Prompt Compression Without Multi-Layer Perceptrons [33.53334153279698]
We show that the encoder does not need to keep the original language model's architecture to achieve useful compression.
We introduce a prompt compression encoder after removing the multilayer perceptron (MLP) layers in the Transformer blocks of a language model.
arXiv Detail & Related papers (2025-01-12T06:57:06Z) - L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression [23.179381396167084]
We introduce a novel Learned Lossless Low-complexity Text Compression method (L3TC)
RWKV models achieve the fastest decoding speed with a moderate compression ratio.
We propose an outlier-aware tokenizer that uses a limited vocabulary to cover frequent tokens.
arXiv Detail & Related papers (2024-12-21T14:24:32Z) - Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles [49.65811277223873]
Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training.
Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning.
Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
arXiv Detail & Related papers (2024-10-17T21:35:49Z) - AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering [23.169961738978614]
We propose AdaCoder, an adaptive prompt compression framework for visual question answering models.
AdaCoder operates in two phases: a compression phase and an inference phase.
We demonstrate that it reduces token length by 71.1%, while maintaining or even improving the performance of visual question answering.
arXiv Detail & Related papers (2024-07-28T06:23:06Z) - Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models [21.025001473355996]
We formalize the problem of prompt compression for large language models (LLMs)
We present a framework to unify token-level prompt compression methods which create hard prompts for black-box models.
We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy.
arXiv Detail & Related papers (2024-07-22T09:40:13Z) - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass [72.07642648108849]
Superposed Decoding is a new decoding algorithm that generates $k$ drafts at the cost of one autoregressive inference pass.
Superposed Decoding can be combined with other decoding strategies, resulting in universal coverage gains when scaling inference time compute.
arXiv Detail & Related papers (2024-05-28T17:40:48Z) - LLMLingua: Compressing Prompts for Accelerated Inference of Large
Language Models [22.06402870816756]
Large language models (LLMs) have been applied in various applications due to their astonishing capabilities.
This paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity.
We show that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss.
arXiv Detail & Related papers (2023-10-09T14:10:21Z) - Improving Zero-Shot Generalization for CLIP with Synthesized Prompts [135.4317555866831]
Most existing methods require labeled data for all classes, which may not hold in real-world applications.
We propose a plug-and-play generative approach called textbfSynttextbfHestextbfIzed textbfPrompts(textbfSHIP) to improve existing fine-tuning methods.
arXiv Detail & Related papers (2023-07-14T15:15:45Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.