DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language
Models
- URL: http://arxiv.org/abs/2402.13291v2
- Date: Fri, 23 Feb 2024 17:26:06 GMT
- Title: DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language
Models
- Authors: Berkay Berabi, Alexey Gronskiy, Veselin Raychev, Gishor Sivanrupan,
Victor Chibotaru, Martin Vechev
- Abstract summary: Large language models (LLMs) are increasingly used to solve various programming tasks.
We show that the task is difficult as it requires the model to learn long-range code relationships.
We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs.
- Score: 3.1690235522182104
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The automated program repair field has attracted substantial interest over
the years, but despite significant research efforts, creating a system that
works well for complex semantic bugs such as security vulnerabilities has
proven difficult. A promising direction to solve this challenge is by
leveraging large language models (LLMs), which are increasingly used to solve
various programming tasks. In this paper, we investigate the effectiveness of
LLMs for solving code-repair task. We show that the task is difficult as it
requires the model to learn long-range code relationships, a task that
inherently relies on extensive amounts of training data. At the same time,
creating a large, clean dataset for complex program bugs and their
corresponding fixes is non-trivial. We propose a technique to address these
challenges with a new approach for querying and fine-tuning LLMs. The idea is
to use program analysis to limit the LLM's attention mechanism on the portions
of code needed to perform the fix, drastically reducing the amount of required
training data. Concretely, for training and inference, rather than feeding the
entire program to the LLM, we reduce its code to a much shorter snippet that
contains the reported defect together with the necessary context - and use that
instead. Our evaluation shows that this code reduction approach substantially
improves available models such as GPT-4 using few-shot learning, as well as
fine-tuning models. To train and evaluate our system, we created a
comprehensive code fixing dataset by extensively labeling 156 bug patterns
(including 40 security rules), requiring complex interprocedural dataflow to
discover. Our best system with Mixtral-8x7B can remove more than 80% of the
reported defects while exactly matching the human fix in between 10 and 50% of
cases, outperforming baselines based on GPT-3.5 and GPT-4, or based on
window-based models like TFix.
Related papers
- Where's the Bug? Attention Probing for Scalable Fault Localization [18.699014321422023]
We present Bug Attention Probe (BAP), a method which learns state-of-the-art fault localization without any direct localization labels.
BAP is significantly more efficient than prompting, outperforming large open-weight models at a small fraction of the computational cost.
arXiv Detail & Related papers (2025-02-19T18:59:32Z) - SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.
SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.
We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z) - LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models [9.946058168276744]
Large Language Models (LLM) have opened up the possibility for many software defects to be patched automatically.
We propose an iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy.
We achieve a human-verified quality score of 8.51/10 and an increase in groundtruth code similarity of 20% with Llama 3 70B.
arXiv Detail & Related papers (2025-01-07T00:21:42Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection [8.22737389683156]
Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning.
We introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy.
Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace.
arXiv Detail & Related papers (2024-09-20T16:47:34Z) - An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks.
CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - To Err is Machine: Vulnerability Detection Challenges LLM Reasoning [8.602355712876815]
We present a challenging code reasoning task: vulnerability detection.
State-of-the-art (SOTA) models reported only 54.5% Balanced Accuracy in our vulnerability detection evaluation.
New models, new training methods, or more execution-specific pretraining data may be needed to conquer vulnerability detection.
arXiv Detail & Related papers (2024-03-25T21:47:36Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.