Learning to Compress Prompt in Natural Language Formats
- URL: http://arxiv.org/abs/2402.18700v2
- Date: Tue, 2 Apr 2024 02:38:31 GMT
- Title: Learning to Compress Prompt in Natural Language Formats
- Authors: Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu,
- Abstract summary: Large language models (LLMs) are great at processing multiple natural language processing tasks.
LLMs are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results.
This work aims to compress lengthy prompts in the form of natural language with LLM transferability.
- Score: 54.06967020905763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.
Related papers
- Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer Product [8.014705094248589]
Low- parameters prompt tuning method outperforms state-of-the-art PT-based and LoRA-based methods in performance and efficiency.
Experiments across six architectures and eight datasets demonstrate that LAMP outperforms state-of-the-art PT-based and LoRA-based methods in performance and efficiency.
arXiv Detail & Related papers (2025-02-16T05:50:12Z) - GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers [52.17222304851524]
We introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning.
By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models.
GReaTer consistently outperforms previous state-of-the-art prompt optimization methods.
arXiv Detail & Related papers (2024-12-12T20:59:43Z) - Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting [22.025533583703126]
We propose a novel Prompt Recursive Search (PRS) framework for large language models (LLMs)
PRS framework incorporates an assessment of problem complexity and an adjustable structure, ensuring a reduction in the likelihood of errors.
Compared to the Chain of Thought (CoT) method, the PRS method has increased the accuracy on the BBH dataset by 8% using Llama3-7B model, achieving a 22% improvement.
arXiv Detail & Related papers (2024-08-02T17:59:42Z) - SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself [14.545490629324295]
Long prompt leads to huge hardware costs when using Large Language Models.
This paper proposes a Self-Compressor (SelfCP) to compress over-limit prompts into dense vectors while keeping the allowed prompts unmodified.
We show that SelfCP effectively substitutes 12$times$ over-limit prompts with dense tokens to reduce memory costs and booster inference throughputs.
arXiv Detail & Related papers (2024-05-27T11:14:55Z) - SirLLM: Streaming Infinite Retentive LLM [74.40196814292426]
Large Language Models (LLMs) process inputs of any length and maintain a degree of memory.
Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs.
We introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues.
arXiv Detail & Related papers (2024-05-21T06:37:03Z) - InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM.
InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention.
Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z) - LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning
Language Models [14.087415157225715]
Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks.
Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers.
This paper introduces an alternative to SFT called Natural Language Feedback for Finetuning LLMs (LaFFi)
arXiv Detail & Related papers (2023-12-31T21:18:16Z) - LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures.
This work identifies three major factors contributing to this length generalization failure.
We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z) - RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL)
RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs)
Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.