Related papers: Neuron-level LLM Patching for Code Generation

Neuron-level LLM Patching for Code Generation

URL: http://arxiv.org/abs/2312.05356v3
Date: Mon, 15 Apr 2024 07:31:00 GMT
Title: Neuron-level LLM Patching for Code Generation
Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang,
Abstract summary: Large Language Models (LLMs) have found widespread adoption in software engineering, particularly in code generation tasks. We propose a novel and effective model editing approach, textscMENT, to patch LLMs in coding tasks.
Score: 32.178931149612644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have found widespread adoption in software engineering, particularly in code generation tasks. However, updating these models with new knowledge can be prohibitively expensive, yet it is essential for maximizing their utility. In this paper, we propose a novel and effective model editing approach, \textsc{MENT}, to patch LLMs in coding tasks. \textsc{MENT} is effective, efficient, and reliable. It can correct a neural model by patching 1 or 2 neurons. As the pioneer work on neuron-level model editing of generative models, we formalize the editing process and introduce the involved concepts. Besides, we also introduce new measures to evaluate its generalization ability, and build a benchmark for further study. Our approach is evaluated on three coding tasks, including API-seq recommendation, line-level code generation, and pseudocode-to-code transaction. The experimental results show that the proposed approach outperforms the state of the arts by a significant margin in both effectiveness and efficiency measures. In addition, we demonstrate the usages of \textsc{MENT} for LLM reasoning in software engineering. By editing LLM knowledge, the directly or indirectly dependent behaviors of API invocation in the chain-of-thought will change accordingly. It explained the significance of repairing LLMs.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation [32.178931149612644]
We propose ulSemantic ulTargeting for ulAnalytical ulRepair (textscSTAR), a pioneering and novel semantic-based optimization approach for repairing Language Models (LMs)
arXiv Detail & Related papers (2025-03-17T07:59:42Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks. CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models [3.1690235522182104]
Large language models (LLMs) are increasingly used to solve various programming tasks. We show that the task is difficult as it requires the model to learn long-range code relationships. We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs.
arXiv Detail & Related papers (2024-02-19T18:35:40Z)
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models [0.0]
We propose an architecture to teach the model memorizing prompt during generation by synthetic gradient. We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method.
arXiv Detail & Related papers (2023-11-03T15:34:02Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning Tasks [11.837851107416588]
This paper investigates the effectiveness of existing neural code generation models on Machine Learning programming tasks. We select six state-of-the-art neural code generation models, and evaluate their performance on four widely used ML libraries. Our empirical study reveals some good, bad, and missing aspects of neural code generation models on ML tasks.
arXiv Detail & Related papers (2023-05-16T00:52:02Z)
Greener yet Powerful: Taming Large Code Generation Models with Quantization [47.734976584580224]
Large pretrained deep learning models have substantially pushed the boundary of code generation. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment. Model compression is a promising approach to address these challenges.
arXiv Detail & Related papers (2023-03-09T16:25:51Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics [73.96837492216204]
We propose NeuroLogic A*esque, a decoding algorithm that incorporates estimates of future cost. We develop efficient lookaheads that are efficient for large-scale language models. Our approach achieves competitive baselines on five generation tasks, and new state-of-the-art performance on table-to-text generation, constrained machine translation, and keyword-constrained generation.
arXiv Detail & Related papers (2021-12-16T09:22:54Z)
Learning to Encode Position for Transformer with Continuous Dynamical Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.