Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation
- URL: http://arxiv.org/abs/2312.05356v4
- Date: Tue, 6 Aug 2024 03:57:33 GMT
- Title: Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation
- Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang,
- Abstract summary: Large Language Models (LLMs) have already gained widespread adoption in software engineering, particularly in code generation tasks.
We propose textscMENT, a novel and effective model editing approach to repair LLMs in coding tasks.
textscMENT is effective, efficient, and reliable, capable of correcting a neural model by patching just one or two neurons.
- Score: 32.178931149612644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have already gained widespread adoption in software engineering, particularly in code generation tasks. However, updating these models with new knowledge can be prohibitively expensive, yet it is essential to maximize their utility, such as implementing a hotfix technique to address urgent or critical LLM errors. In this paper, we propose \textsc{MENT}, a novel and effective model editing approach to repair LLMs in coding tasks. \textsc{MENT} is effective, efficient, and reliable, capable of correcting a neural model by patching just one or two neurons. As pioneering work on neuron-level model editing of generative models, we formalize the editing process and introduce the involved concepts. We also introduce new measures to evaluate its generalization ability and establish a benchmark for further study. Our approach is evaluated on three coding tasks: line-level code generation, shellcode generation, and intent-to-bash translation. The experimental results demonstrate that the proposed approach significantly outperforms the state-of-the-art in both effectiveness and efficiency measures. Furthermore, we showcase the applications of \textsc{MENT} for LLM reasoning in software engineering. By editing LLM knowledge, the directly or indirectly dependent behaviors of API invocation in the chain-of-thought change accordingly. This illustrates the significance of repairing LLMs in the context of software engineering.
Related papers
- An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks.
CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language
Models [3.1690235522182104]
Large language models (LLMs) are increasingly used to solve various programming tasks.
We show that the task is difficult as it requires the model to learn long-range code relationships.
We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs.
arXiv Detail & Related papers (2024-02-19T18:35:40Z) - ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting
of RNN-like Language Models [0.0]
We propose an architecture to teach the model memorizing prompt during generation by synthetic gradient.
We construct a dataset for experiments, and the results have demonstrated the effectiveness of our method.
arXiv Detail & Related papers (2023-11-03T15:34:02Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - The Good, the Bad, and the Missing: Neural Code Generation for Machine
Learning Tasks [11.837851107416588]
This paper investigates the effectiveness of existing neural code generation models on Machine Learning programming tasks.
We select six state-of-the-art neural code generation models, and evaluate their performance on four widely used ML libraries.
Our empirical study reveals some good, bad, and missing aspects of neural code generation models on ML tasks.
arXiv Detail & Related papers (2023-05-16T00:52:02Z) - Greener yet Powerful: Taming Large Code Generation Models with
Quantization [47.734976584580224]
Large pretrained deep learning models have substantially pushed the boundary of code generation.
Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment.
Model compression is a promising approach to address these challenges.
arXiv Detail & Related papers (2023-03-09T16:25:51Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead
Heuristics [73.96837492216204]
We propose NeuroLogic A*esque, a decoding algorithm that incorporates estimates of future cost.
We develop efficient lookaheads that are efficient for large-scale language models.
Our approach achieves competitive baselines on five generation tasks, and new state-of-the-art performance on table-to-text generation, constrained machine translation, and keyword-constrained generation.
arXiv Detail & Related papers (2021-12-16T09:22:54Z) - Learning to Encode Position for Transformer with Continuous Dynamical
Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models.
We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.