A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation
- URL: http://arxiv.org/abs/2503.12899v2
- Date: Mon, 14 Apr 2025 13:57:28 GMT
- Title: A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation
- Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang,
- Abstract summary: We propose ulSemantic ulTargeting for ulAnalytical ulRepair (textscSTAR), a pioneering and novel semantic-based optimization approach for repairing Language Models (LMs)
- Score: 32.178931149612644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language Models (LMs) are widely used in software engineering for code generation, but they may produce code with errors. Rather than repairing the generated code, an alternative way is to address the underlying failures of models. LM repair offers a lightweight solution to this challenge: it requires minimal data, reduces computational costs, and reduces the side effects. Unlike retraining, LM repair focuses on applying tailored updates to targeted neurons, making it ideal for scenarios with limited resources, high-performance demands, or strict safety requirements. In this paper, we propose \ul{S}emantic \ul{T}argeting for \ul{A}nalytical \ul{R}epair (\textsc{STAR}), a pioneering and novel semantic-based optimization approach for repairing LLMs. \textsc{STAR} realizes main operations in LM repair methods in an optimization process, including locating ``buggy neurons'', solving ``neuron patches'', and patching ``buggy neurons''. Correspondingly, it computes the deltas of weight matrix as the prior information to guide optimization; and attributes the targeted layers and neurons leveraging statistical insights. The neuron patches are computed with a solid semantic-based analytical formula, which directly bridges the changes to logits with the deltas of neurons, by steering latent representations. Compared to the prior work of LM repair (\textsc{MINT}) and optimization methods (\textsc{SGD}), \textsc{STAR} integrates their strengths while mitigating their limitations. \textsc{STAR} supports solving multiple failures together, significantly improving the usefulness. Evaluated on three code generation tasks using popular code LMs, \textsc{STAR} demonstrates superior effectiveness. Additionally, \textsc{STAR} exhibits better efficiency. In terms of side effects, namely the balance between generalization and specificity, \textsc{STAR} outperforms prior work by a significant margin.
Related papers
- LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.<n>Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.<n>We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training [3.195234044113248]
We propose textscNeuroAL, a emphtop-up algorithm for network pruning.
It modifies the block-wise and row-wise sparsity exploiting information from both the dense model and its sparse version.
It consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off.
arXiv Detail & Related papers (2024-11-11T15:30:16Z) - OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling [62.19438812624467]
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning.
We propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment [126.34547428473968]
Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios.
We propose a low-redundant alignment method named textbfALLO, focusing on optimizing the most related neurons with the most useful supervised signals.
Experimental results on 10 datasets have shown the effectiveness of ALLO.
arXiv Detail & Related papers (2024-06-18T13:34:40Z) - Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques.
Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning.
Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques.
Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z) - Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation [32.178931149612644]
ulModel ulImprovement via ulNeuron ulTargeting (textscMINT) is a novel approach for repairing code Language Models (LMs)
textscMINT is effective, efficient, and reliable, capable of correcting a neural model by patching a minimum number of neurons.
arXiv Detail & Related papers (2023-12-08T20:28:08Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.