Related papers: Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

URL: http://arxiv.org/abs/2404.04997v2
Date: Thu, 18 Apr 2024 23:23:53 GMT
Title: Adapting LLMs for Efficient Context Processing through Soft Prompt Compression
Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd,
Abstract summary: This article presents an innovative framework that strategically tailors Large Language Models for streamlined context processing. Our methodology, dubbed SoftPromptComp, amalgamates natural language prompts with dynamically generated soft prompts to forge a concise yet semantically robust depiction of protracted contexts. We substantiate that our framework markedly diminishes computational overhead and enhances LLMs' efficacy across various benchmarks.
Score: 1.1550486371582305
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context window sizes and the computational burdens entailed by their operations. This investigation presents an innovative framework that strategically tailors LLMs for streamlined context processing by harnessing the synergies among natural language summarization, soft prompt compression, and augmented utility preservation mechanisms. Our methodology, dubbed SoftPromptComp, amalgamates natural language prompts extracted from summarization methodologies with dynamically generated soft prompts to forge a concise yet semantically robust depiction of protracted contexts. This depiction undergoes further refinement via a weighting mechanism optimizing information retention and utility for subsequent tasks. We substantiate that our framework markedly diminishes computational overhead and enhances LLMs' efficacy across various benchmarks, while upholding or even augmenting the caliber of the produced content. By amalgamating soft prompt compression with sophisticated summarization, SoftPromptComp confronts the dual challenges of managing lengthy contexts and ensuring model scalability. Our findings point towards a propitious trajectory for augmenting LLMs' applicability and efficiency, rendering them more versatile and pragmatic for real-world applications. This research enriches the ongoing discourse on optimizing language models, providing insights into the potency of soft prompts and summarization techniques as pivotal instruments for the forthcoming generation of NLP solutions.

Related papers

CREM: Compression-Driven Representation Enhancement for Multimodal Retrieval and Comprehension [49.6969505536365]
We propose CREM, with a unified framework that enhances multimodal representations for retrieval while preserving generative ability.<n>CREM achieves state-of-the-art retrieval performance on MMEB while maintaining strong generative performance on multiple comprehension benchmarks.
arXiv Detail & Related papers (2026-02-22T08:09:51Z)
ContextNav: Towards Agentic Multimodal In-Context Learning [85.05420047017513]
ContextNav is an agentic framework that integrates the scalability of automated retrieval with the quality and adaptiveness of human-like curation.<n>It builds a resource-aware multimodal embedding pipeline, maintains a retrievable vector database, and applies agentic retrieval and structural alignment to construct noise-resilient contexts.<n> Experimental results demonstrate that ContextNav achieves state-of-the-art performance across various datasets.
arXiv Detail & Related papers (2025-10-06T07:49:52Z)
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective [65.12150411762273]
We show that pruning random demonstrations into seemingly incoherent "gibberish" can remarkably improve performance across diverse tasks.<n>We propose a self-discover prompt optimization framework, PromptQuine, that automatically searches for the pruning strategy by itself using only low-data regimes.
arXiv Detail & Related papers (2025-06-22T07:53:07Z)
Generalizing Large Language Model Usability Across Resource-Constrained [0.43512163406552007]
dissertation presents a systematic study toward generalizing Large Language Models under real-world constraints.<n>First, it introduces a robust text-centric alignment framework that enables LLMs to seamlessly integrate diverse modalities.<n>Beyond multimodal setting, the dissertation investigates inference-time optimization strategies for LLMs.
arXiv Detail & Related papers (2025-05-13T01:00:12Z)
PIS: Linking Importance Sampling and Attention Mechanisms for Efficient Prompt Compression [3.6268731121741067]
Large language models (LLMs) have achieved remarkable progress, demonstrating unprecedented capabilities across various natural language processing tasks. Existing prompt compression methods rely on truncation or abstractive summarization techniques. We introduce Prompt Importance Sampling (PIS), a novel compression framework that dynamically compresses prompts by sampling important tokens.
arXiv Detail & Related papers (2025-04-23T09:53:01Z)
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation [57.310236384112834]
In-context learning (ICL) is critical for large language models (LLMs) but its effectiveness is constrained by finite context windows. We introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory. We demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting.
arXiv Detail & Related papers (2025-04-02T13:15:44Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
Contemporary Model Compression on Large Language Models Inference [7.307436175842646]
Large Language Models (LLMs) have revolutionized natural language processing by achieving state-of-the-art results across a variety of tasks. The computational demands of LLM inference, including high memory consumption and slow processing speeds, pose significant challenges for real-world applications. This survey explores techniques in model compression that address these challenges by reducing the size and computational requirements of LLMs.
arXiv Detail & Related papers (2024-09-03T15:35:01Z)
GANPrompt: Enhancing Robustness in LLM-Based Recommendations with GAN-Enhanced Diversity Prompts [15.920623515602038]
This paper proposes GANPrompt, a multi-dimensional large language model prompt diversity framework based on Generative Adversarial Networks (GANs) GANPrompt first trains a generator capable of producing diverse prompts by analysing multidimensional user behavioural data. These diverse prompts are then used to train the LLM to improve its performance in the face of unseen prompts.
arXiv Detail & Related papers (2024-08-19T03:13:20Z)
Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach. Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens. We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z)
Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding [11.5386284281652]
We introduce a novel approach that re-imagines information retrieval through dynamic in-context editing. By treating lengthy contexts as malleable external knowledge, our method interactively gathers and integrates relevant information. Experimental results demonstrate that our method effectively empowers context-limited LLMs to engage in multi-hop reasoning with improved performance.
arXiv Detail & Related papers (2024-06-18T06:54:28Z)
When Emotional Stimuli meet Prompt Designing: An Auto-Prompt Graphical Paradigm [43.2625101868969]
This paper summarizes the prompt words for large language models (LLMs) It then proposes an Auto-Prompt Graphical Paradigm(APGP) that combines both stimulating and framework prompts. The framework involves automated prompt generation and consideration of emotion-stimulus factors.
arXiv Detail & Related papers (2024-04-16T12:19:08Z)
PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models [9.362082187605356]
We present PhaseEvo, an efficient automatic prompt optimization framework that combines the generative capability of LLMs with the global search proficiency of evolution algorithms. PhaseEvo significantly outperforms the state-of-the-art baseline methods by a large margin whilst maintaining good efficiency.
arXiv Detail & Related papers (2024-02-17T17:47:10Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)
Extending Context Window of Large Language Models via Semantic Compression [21.35020344956721]
Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. We propose a novel semantic compression method that enables generalization to texts 6-8 times longer, without incurring significant computational costs or requiring fine-tuning.
arXiv Detail & Related papers (2023-12-15T07:04:33Z)
Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled. First, we quantize and embed both text and visual prompt into a unified representational space. Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z)
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers [70.18534453485849]
EvoPrompt is a framework for discrete prompt optimization. It borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. It significantly outperforms human-engineered prompts and existing methods for automatic prompt generation.
arXiv Detail & Related papers (2023-09-15T16:50:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.