Related papers: Hotfixing Large Language Models for Code

Hotfixing Large Language Models for Code

URL: http://arxiv.org/abs/2408.05727v4
Date: Wed, 6 Nov 2024 14:18:39 GMT
Title: Hotfixing Large Language Models for Code
Authors: Zhou Yang, David Lo,
Abstract summary: Large Language Models for Code (LLM4Code) have become an integral part of developers', assisting with tasks such as code completion and generation. These models are found to exhibit undesired behaviors after their release, like generating buggy code. This paper mainly focuses on hotfixing LLM4Code to make them generate less buggy code and more fixed code.
Score: 8.243596444097506
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models for Code (LLM4Code) have become an integral part of developers' workflows, assisting with tasks such as code completion and generation. However, these models are found to exhibit undesired behaviors after their release, like generating buggy code, due to their extensive training on vast amounts of source code that contain such buggy code. The training data (usually coming from open-source software) keeps evolving, e.g., developers fix the buggy code. However, adapting such evolution to mitigate LLM4Code's undesired behaviors is non-trivial, as retraining models on the updated dataset usually takes much time and resources. This motivates us to propose the concept of hotfixing LLM4Code, mitigating LLM4Code's undesired behaviors effectively and efficiently with minimal negative effects. This paper mainly focuses on hotfixing LLM4Code to make them generate less buggy code and more fixed code. We begin by demonstrating that models from the popular CodeGen family frequently generate buggy code. Then, we define three learning objectives in hotfixing and design multiple loss functions for each objective: (1) learn the desired behaviors, (2) unlearn the undesired behaviors, and (3) retain knowledge of other code. We evaluate four different fine-tuning techniques for hotfixing the models and gain the following insights. Optimizing these three learning goals together, using LoRA (low-rank adaptation), effectively influences the model's behavior. Specifically, it increases the generation of fixed code by up to 108.42% and decreases the generation of buggy code by up to 50.47%. Statistical tests confirm that hotfixing does not significantly affect the models' functional correctness on the HumanEval benchmark. Additionally, to evaluate the generalizability of hotfixing by reducing the exposure of email addresses by 99.30%.

Related papers

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar [15.421030528350212]
We build a code-obfuscation based benchmark OBFUSEVAL to evaluate large language models. We use three-level strategy to obfuscate descriptions, code and context dependencies. The results show that after obfuscation, the average decrease ratio of test pass rate can up to 62.5%.
arXiv Detail & Related papers (2024-12-11T05:31:39Z)
CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement [32.46078765471136]
We introduce CodeLutra, a novel framework that enhances low-performing large language models. Unlike conventional fine-tuning, CodeLutra employs an iterative preference learning mechanism to compare correct and incorrect solutions. On a challenging data analysis task, using just 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's performance.
arXiv Detail & Related papers (2024-11-07T21:51:07Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
To Code, or Not To Code? Exploring Impact of Code in Pre-training [13.336902036852115]
We systematically investigate the impact of code data on general performance. We find that code is a critical building block for generalization far beyond coding tasks. Our work suggests investments in code quality and preserving code during pre-training have positive impacts.
arXiv Detail & Related papers (2024-08-20T14:58:13Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
CYCLE: Learning to Self-Refine the Code Generation [19.71833229434497]
We propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B benchmarks. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs.
arXiv Detail & Related papers (2024-03-27T16:45:02Z)
GenCode: A Generic Data Augmentation Framework for Boosting Deep Learning-Based Code Understanding [28.02426812004216]
We introduce a generic data augmentation framework, GenCode, to enhance the training of code understanding models. To evaluate the effectiveness of GenCode, we conduct experiments on four code understanding tasks and three pre-trained code models. Compared to the state-of-the-art (SOTA) code augmentation method, MixCode, GenCode produces code models with 2.92% higher accuracy and 4.90% robustness on average.
arXiv Detail & Related papers (2024-02-24T08:57:12Z)
LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. We build a novel data-cleaning pipeline that uses these principles to transform existing programs. We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
CCT5: A Code-Change-Oriented Pre-Trained Model [14.225942520238936]
We propose to pre-train a model specially designed for code changes to better support developers in software maintenance. We first collect a large-scale dataset containing 1.5M+ pairwise data of code changes and commit messages. We fine-tune the pre-trained model, CCT5, on three widely-labelled tasks incurred by code changes and two tasks specific to the code review process.
arXiv Detail & Related papers (2023-05-18T07:55:37Z)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.