Let the Code LLM Edit Itself When You Edit the Code
- URL: http://arxiv.org/abs/2407.03157v1
- Date: Wed, 3 Jul 2024 14:34:03 GMT
- Title: Let the Code LLM Edit Itself When You Edit the Code
- Authors: Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He,
- Abstract summary: underlinetextbfPositional textbfIntegrity textbfEncoding (PIE)
PIE reduces computational overhead by over 85% compared to the standard full recomputation approach.
Results demonstrate that PIE reduces computational overhead by over 85% compared to the standard full recomputation approach.
- Score: 50.46536185784169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequence length is long. Simply encoding the edited subsequence and integrating it to the original KV cache meets the temporal confusion problem, leading to significantly worse performance. We address this efficiency and accuracy trade-off by introducing \underline{\textbf{Positional \textbf{I}ntegrity \textbf{E}ncoding} (PIE). Building upon the rotary positional encoding, PIE first removes the rotary matrices in the Key cache that introduce temporal confusion and then reapplies the correct rotary matrices. This process ensures that positional relationships between tokens are correct and requires only a single round of matrix multiplication. We validate the effectiveness of PIE through extensive experiments on the RepoBench-C-8k dataset, utilizing DeepSeek-Coder models with 1.3B, 6.7B, and 33B parameters. Our evaluation includes three real-world coding tasks: code insertion, code deletion, and multi-place code editing. Results demonstrate that PIE reduces computational overhead by over 85% compared to the standard full recomputation approach across all model sizes and tasks while well approximating the model performance.
Related papers
- Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference [41.93955876156331]
Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs.
speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model.
Compared with autoregressive decoding, speculative decoding generates the same number of tokens with fewer runs of the large model.
An algorithm that has better output perplexity and even better efficiency than speculative decoding can be more useful in practice.
arXiv Detail & Related papers (2024-07-12T23:29:54Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Progressive-Proximity Bit-Flipping for Decoding Surface Codes [9.801253635315636]
Topological quantum codes, such as toric and surface codes, are excellent candidates for hardware implementation.
Existing decoders often fall short of meeting requirements such as having low computational complexity.
We propose a novel bit-flipping (BF) decoder tailored for toric and surface codes.
arXiv Detail & Related papers (2024-02-24T22:38:05Z) - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding.
We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z) - Regress Before Construct: Regress Autoencoder for Point Cloud
Self-supervised Learning [18.10704604275133]
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning.
Our approach is efficient during pre-training and generalizes well on various downstream tasks.
arXiv Detail & Related papers (2023-09-25T17:23:33Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - CLAWSAT: Towards Both Robust and Accurate Code Models [74.57590254102311]
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models.
To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models.
arXiv Detail & Related papers (2022-11-21T18:32:50Z) - Sparsifying Parity-Check Matrices [60.28601275219819]
We consider the problem of minimizing the number of one-entries in parity-check matrices.
In the maximum-likelihood (ML) decoding method, the number of ones in PCMs is directly related to the time required to decode messages.
We propose a simple matrix row manipulation which alters the PCM, but not the code itself.
arXiv Detail & Related papers (2020-05-08T05:51:40Z) - Pruning Neural Belief Propagation Decoders [77.237958592189]
We introduce a method to tailor an overcomplete parity-check matrix to (neural) BP decoding using machine learning.
We achieve performance within 0.27 dB and 1.5 dB of the ML performance while reducing the complexity of the decoder.
arXiv Detail & Related papers (2020-01-21T12:05:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.