CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs
- URL: http://arxiv.org/abs/2502.14925v1
- Date: Wed, 19 Feb 2025 23:15:23 GMT
- Title: CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs
- Authors: Pengfei He, Shaowei Wang, Tse-Hsun Chen,
- Abstract summary: Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts.<n>Existing prompt compression techniques focus on natural language, lacking tailored solutions for code.<n>We propose CodePromptZip, a framework that compresses code examples before integrating into RAG.
- Score: 6.936336826531964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. However, lengthy prompts, often exceeding tens of thousands of tokens, introduce challenges related to limited context windows of language models (LMs) and high computational costs. Existing prompt compression techniques focus on natural language, lacking tailored solutions for code. To address the gap, we propose CodePromptZip, a framework that compresses code examples before integrating into RAG workflows. Our framework employs a type-aware, priority-driven strategy to construct training samples for training code compression model. By using program analysis, we identify token types (e.g., Identifier) and perform ablation analysis to rank their removal priorities based on their impact on task performance. We then train a small LM as the compressor on these samples, enabling flexible compression conditioned on specified ratios while minimizing performance degradation. Specially, the compressor is augmented with a copy mechanism, allowing tokens to be directly copied from the original code snippets. Evaluation results show that CodePromptZip surpasses SOTA entropy-based and distillation-based baselines, improving by 23.4%, 28.7%, and 8.7% over the best baseline for Assertion Generation, Bugs2Fix, and Code Suggestion, respectively.
Related papers
- NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models [63.271278137295006]
Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks.
LLMs suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments.
We propose NoWag: (Normalized Weight and Activation Guided Compression) as a unified framework for zero-shot shape preserving compression algorithms.
arXiv Detail & Related papers (2025-04-20T11:00:29Z) - LightThinker: Thinking Step-by-Step Compression [53.8069487638972]
We propose LightThinker, a method that enables large language models to dynamically compress intermediate thoughts during reasoning.<n>Inspired by human cognitive processes, LightThinker compresses thought steps into compact representations and discards the original reasoning chains.<n>Experiments show that LightThinker reduces peak memory usage and inference time, while maintaining competitive accuracy.
arXiv Detail & Related papers (2025-02-21T16:57:22Z) - Better Prompt Compression Without Multi-Layer Perceptrons [33.53334153279698]
We show that the encoder does not need to keep the original language model's architecture to achieve useful compression.<n>We introduce a prompt compression encoder after removing the multilayer perceptron (MLP) layers in the Transformer blocks of a language model.
arXiv Detail & Related papers (2025-01-12T06:57:06Z) - EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [84.70637613266835]
We re-formulate the model compression problem into the customized compensation problem.
We propose Training-free Eigenspace Low-Rank Approximation (EoRA)
EoRA directly minimizes compression-induced errors without requiring gradient-based training.
arXiv Detail & Related papers (2024-10-28T17:59:03Z) - Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles [49.65811277223873]
Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training.
Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning.
Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
arXiv Detail & Related papers (2024-10-17T21:35:49Z) - LanguaShrink: Reducing Token Overhead with Psycholinguistics [8.123272461141815]
LanguaShrink is a prompt compression framework for large language models.
It reduces prompt length while preserving essential information.
Compared to existing prompt compression methods, LanguaShrink improves end-to-end latency by 1.43 times.
arXiv Detail & Related papers (2024-09-01T22:09:20Z) - Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models [21.025001473355996]
We formalize the problem of prompt compression for large language models (LLMs)<n>We present a framework to unify token-level prompt compression methods which create hard prompts for black-box models.<n>We show that there is a large gap between the performance of current prompt compression methods and the optimal strategy.
arXiv Detail & Related papers (2024-07-22T09:40:13Z) - Say More with Less: Understanding Prompt Learning Behaviors through Gist
Compression [39.233017243612025]
Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions.
We propose a novel method for compressing prompts which also can assist the prompt interpretation and engineering.
Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens.
arXiv Detail & Related papers (2024-02-25T11:07:08Z) - Long Context Compression with Activation Beacon [22.054232261437186]
Activation Beacon is a plug-in module for transformer-based LLMs.
It targets effective, efficient, and flexible compression of long contexts.
It achieves a 2x acceleration in inference time and an 8x reduction of memory costs for KV cache.
arXiv Detail & Related papers (2024-01-07T11:57:40Z) - Hot or Cold? Adaptive Temperature Sampling for Code Generation with
Large Language Models [54.72004797421481]
We conduct the first systematic study to explore a decoding strategy specialized in code generation.
Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling.
Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
arXiv Detail & Related papers (2023-09-06T06:27:33Z) - Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler
Alignment of Embeddings for Asymmetrical dual encoders [89.29256833403169]
We introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods.
KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation.
Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.
arXiv Detail & Related papers (2023-03-31T15:44:13Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.