Neural Transfer Learning for Repairing Security Vulnerabilities in C
Code
- URL: http://arxiv.org/abs/2104.08308v1
- Date: Fri, 16 Apr 2021 18:32:51 GMT
- Title: Neural Transfer Learning for Repairing Security Vulnerabilities in C
Code
- Authors: Zimin Chen, Steve Kommrusch and Martin Monperrus
- Abstract summary: We propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning.
VRepair is first trained on a large bug fix corpus, and is then tuned on a vulnerability fix dataset, which is an order of magnitudes smaller.
In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities.
- Score: 14.664825927959644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we address the problem of automatic repair of software
vulnerabilities with deep learning. The major problem with data-driven
vulnerability repair is that the few existing datasets of known confirmed
vulnerabilities consist of only a few thousand examples. However, training a
deep learning model often requires hundreds of thousands of examples. In this
work, we leverage the intuition that the bug fixing task and the vulnerability
fixing task are related, and the knowledge learned from bug fixes can be
transferred to fixing vulnerabilities. In the machine learning community, this
technique is called transfer learning. In this paper, we propose an approach
for repairing security vulnerabilities named VRepair which is based on transfer
learning. VRepair is first trained on a large bug fix corpus, and is then tuned
on a vulnerability fix dataset, which is an order of magnitudes smaller. In our
experiments, we show that a model trained only on a bug fix corpus can already
fix some vulnerabilities. Then, we demonstrate that transfer learning improves
the ability to repair vulnerable C functions. In the end, we present evidence
that transfer learning produces more stable and superior neural models for
vulnerability repair.
Related papers
- RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models.
We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning [5.1071146597039245]
Large Language Models (LLMs) have shown significant challenges in detecting and repairing vulnerable code.
In this study, we utilize GitHub Copilot as the LLM and focus on buffer overflow vulnerabilities.
Our experiments reveal a notable gap in Copilot's abilities when dealing with buffer overflow vulnerabilities, with a 76% vulnerability detection rate but only a 15% vulnerability repair rate.
arXiv Detail & Related papers (2024-09-27T02:25:29Z) - Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies.
This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z) - REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes [40.401211102969356]
We propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories.
We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs.
Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations.
arXiv Detail & Related papers (2023-09-15T02:50:08Z) - Pre-trained Model-based Automated Software Vulnerability Repair: How Far
are We? [14.741742268621403]
We show that studied pre-trained models consistently outperform the state-of-the-art technique VRepair with a prediction accuracy of 32.94%44.96%.
Surprisingly, a simplistic approach adopting transfer learning improves the prediction accuracy of pre-trained models by 9.40% on average.
Our study highlights the promising future of adopting pre-trained models to patch real-world vulnerabilities.
arXiv Detail & Related papers (2023-08-24T03:43:10Z) - Enabling Automatic Repair of Source Code Vulnerabilities Using
Data-Driven Methods [0.4568777157687961]
We propose ways to improve code representations for vulnerability repair from three perspectives.
Data-driven models of automatic program repair use pairs of buggy and fixed code to learn transformations that fix errors in code.
The expected results of this work are improved code representations for automatic program repair and, specifically, fixing security vulnerabilities.
arXiv Detail & Related papers (2022-02-07T10:47:37Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z) - V2W-BERT: A Framework for Effective Hierarchical Multiclass
Classification of Software Vulnerabilities [7.906207218788341]
We present a novel Transformer-based learning framework (V2W-BERT) in this paper.
By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches.
We achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data.
arXiv Detail & Related papers (2021-02-23T05:16:57Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.