Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation
- URL: http://arxiv.org/abs/2312.05356v4
- Date: Tue, 6 Aug 2024 03:57:33 GMT
- Title: Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation
- Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang,
- Abstract summary: Large Language Models (LLMs) have already gained widespread adoption in software engineering, particularly in code generation tasks.
We propose textscMENT, a novel and effective model editing approach to repair LLMs in coding tasks.
textscMENT is effective, efficient, and reliable, capable of correcting a neural model by patching just one or two neurons.
- Score: 32.178931149612644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have already gained widespread adoption in software engineering, particularly in code generation tasks. However, updating these models with new knowledge can be prohibitively expensive, yet it is essential to maximize their utility, such as implementing a hotfix technique to address urgent or critical LLM errors. In this paper, we propose \textsc{MENT}, a novel and effective model editing approach to repair LLMs in coding tasks. \textsc{MENT} is effective, efficient, and reliable, capable of correcting a neural model by patching just one or two neurons. As pioneering work on neuron-level model editing of generative models, we formalize the editing process and introduce the involved concepts. We also introduce new measures to evaluate its generalization ability and establish a benchmark for further study. Our approach is evaluated on three coding tasks: line-level code generation, shellcode generation, and intent-to-bash translation. The experimental results demonstrate that the proposed approach significantly outperforms the state-of-the-art in both effectiveness and efficiency measures. Furthermore, we showcase the applications of \textsc{MENT} for LLM reasoning in software engineering. By editing LLM knowledge, the directly or indirectly dependent behaviors of API invocation in the chain-of-thought change accordingly. This illustrates the significance of repairing LLMs in the context of software engineering.
Related papers
- Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)
VML constrains the parameter space to be human-interpretable natural language.
We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs [16.890411067079885]
Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks.
We propose a novel perspective on the learning focus of LLM fine-tuning for program repair.
We apply MORepair to fine-tune four open-source LLMs with different sizes and architectures.
arXiv Detail & Related papers (2024-04-19T05:36:21Z) - Editing Conceptual Knowledge for Large Language Models [65.38231526537476]
This paper pioneers the investigation of editing conceptual knowledge for Large Language Models (LLMs)
We construct a novel benchmark dataset ConceptEdit and establish a suite of new metrics for evaluation.
experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge.
arXiv Detail & Related papers (2024-03-10T16:57:10Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Test-Case-Driven Programming Understanding in Large Language Models for
Better Code Generation [15.166827643436346]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)
It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.
muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z) - Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs.
We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal.
Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z) - Fully Autonomous Programming with Large Language Models [0.9558392439655015]
Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome"
We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation.
The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches.
arXiv Detail & Related papers (2023-04-20T16:12:05Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.