Enabling Automatic Repair of Source Code Vulnerabilities Using
Data-Driven Methods
- URL: http://arxiv.org/abs/2202.03055v1
- Date: Mon, 7 Feb 2022 10:47:37 GMT
- Title: Enabling Automatic Repair of Source Code Vulnerabilities Using
Data-Driven Methods
- Authors: Anastasiia Grishina
- Abstract summary: We propose ways to improve code representations for vulnerability repair from three perspectives.
Data-driven models of automatic program repair use pairs of buggy and fixed code to learn transformations that fix errors in code.
The expected results of this work are improved code representations for automatic program repair and, specifically, fixing security vulnerabilities.
- Score: 0.4568777157687961
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Users around the world rely on software-intensive systems in their day-to-day
activities. These systems regularly contain bugs and security vulnerabilities.
To facilitate bug fixing, data-driven models of automatic program repair use
pairs of buggy and fixed code to learn transformations that fix errors in code.
However, automatic repair of security vulnerabilities remains under-explored.
In this work, we propose ways to improve code representations for vulnerability
repair from three perspectives: input data type, data-driven models, and
downstream tasks. The expected results of this work are improved code
representations for automatic program repair and, specifically, fixing security
vulnerabilities.
Related papers
- Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Automatic Programming: Large Language Models and Beyond [48.34544922560503]
We study concerns around code quality, security and related issues of programmer responsibility.
We discuss how advances in software engineering can enable automatic programming.
We conclude with a forward looking view, focusing on the programming environment of the near future.
arXiv Detail & Related papers (2024-05-03T16:19:24Z) - Causative Insights into Open Source Software Security using Large
Language Code Embeddings and Semantic Vulnerability Graph [3.623199159688412]
Open Source Software (OSS) vulnerabilities can cause unauthorized access, data breaches, network disruptions, and privacy violations.
Recent deep-learning techniques have shown great promise in identifying and localizing vulnerabilities in source code.
Our study shows a 24% improvement in code repair capabilities compared to previous methods.
arXiv Detail & Related papers (2024-01-13T10:33:22Z) - Enhanced Automated Code Vulnerability Repair using Large Language Models [0.0]
This research addresses the complex challenge of automated repair of code vulnerabilities.
It introduces a novel format for the representation of code modification, using advanced Large Language Models (LLMs)
LLMs, fine-tuned on datasets featuring C code vulnerabilities, significantly improve the accuracy and adaptability of automated code repair techniques.
arXiv Detail & Related papers (2024-01-08T09:01:29Z) - LLM-Powered Code Vulnerability Repair with Reinforcement Learning and
Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2.
Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs.
We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z) - REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes [40.401211102969356]
We propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories.
We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs.
Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations.
arXiv Detail & Related papers (2023-09-15T02:50:08Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Neural Transfer Learning for Repairing Security Vulnerabilities in C
Code [14.664825927959644]
We propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning.
VRepair is first trained on a large bug fix corpus, and is then tuned on a vulnerability fix dataset, which is an order of magnitudes smaller.
In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities.
arXiv Detail & Related papers (2021-04-16T18:32:51Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.