Neural Transfer Learning for Repairing Security Vulnerabilities in C
Code
- URL: http://arxiv.org/abs/2104.08308v1
- Date: Fri, 16 Apr 2021 18:32:51 GMT
- Title: Neural Transfer Learning for Repairing Security Vulnerabilities in C
Code
- Authors: Zimin Chen, Steve Kommrusch and Martin Monperrus
- Abstract summary: We propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning.
VRepair is first trained on a large bug fix corpus, and is then tuned on a vulnerability fix dataset, which is an order of magnitudes smaller.
In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities.
- Score: 14.664825927959644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we address the problem of automatic repair of software
vulnerabilities with deep learning. The major problem with data-driven
vulnerability repair is that the few existing datasets of known confirmed
vulnerabilities consist of only a few thousand examples. However, training a
deep learning model often requires hundreds of thousands of examples. In this
work, we leverage the intuition that the bug fixing task and the vulnerability
fixing task are related, and the knowledge learned from bug fixes can be
transferred to fixing vulnerabilities. In the machine learning community, this
technique is called transfer learning. In this paper, we propose an approach
for repairing security vulnerabilities named VRepair which is based on transfer
learning. VRepair is first trained on a large bug fix corpus, and is then tuned
on a vulnerability fix dataset, which is an order of magnitudes smaller. In our
experiments, we show that a model trained only on a bug fix corpus can already
fix some vulnerabilities. Then, we demonstrate that transfer learning improves
the ability to repair vulnerable C functions. In the end, we present evidence
that transfer learning produces more stable and superior neural models for
vulnerability repair.
Related papers
- REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes [40.401211102969356]
We propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories.
We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs.
Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations.
arXiv Detail & Related papers (2023-09-15T02:50:08Z) - Pre-trained Model-based Automated Software Vulnerability Repair: How Far
are We? [14.741742268621403]
We show that studied pre-trained models consistently outperform the state-of-the-art technique VRepair with a prediction accuracy of 32.94%44.96%.
Surprisingly, a simplistic approach adopting transfer learning improves the prediction accuracy of pre-trained models by 9.40% on average.
Our study highlights the promising future of adopting pre-trained models to patch real-world vulnerabilities.
arXiv Detail & Related papers (2023-08-24T03:43:10Z) - Queried Unlabeled Data Improves and Robustifies Class-Incremental
Learning [133.39254981496146]
Class-incremental learning (CIL) suffers from the notorious dilemma between learning newly added classes and preserving previously learned class knowledge.
We propose to leverage "free" external unlabeled data querying in continual learning.
We show queried unlabeled data can continue to benefit, and seamlessly extend CIL-QUD into its robustified versions.
arXiv Detail & Related papers (2022-06-15T22:53:23Z) - Enabling Automatic Repair of Source Code Vulnerabilities Using
Data-Driven Methods [0.4568777157687961]
We propose ways to improve code representations for vulnerability repair from three perspectives.
Data-driven models of automatic program repair use pairs of buggy and fixed code to learn transformations that fix errors in code.
The expected results of this work are improved code representations for automatic program repair and, specifically, fixing security vulnerabilities.
arXiv Detail & Related papers (2022-02-07T10:47:37Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - V2W-BERT: A Framework for Effective Hierarchical Multiclass
Classification of Software Vulnerabilities [7.906207218788341]
We present a novel Transformer-based learning framework (V2W-BERT) in this paper.
By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches.
We achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data.
arXiv Detail & Related papers (2021-02-23T05:16:57Z) - Adversarial Targeted Forgetting in Regularization and Generative Based
Continual Learning Models [2.8021833233819486]
Continual (or "incremental") learning approaches are employed when additional knowledge or tasks need to be learned from subsequent batches or from streaming data.
We show that an intelligent adversary can take advantage of a continual learning algorithm's capabilities of retaining existing knowledge over time.
We show that the adversary can create a "false memory" about any task by inserting carefully-designed backdoor samples to the test instances of that task.
arXiv Detail & Related papers (2021-02-16T18:45:01Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.