Pre-trained Model-based Automated Software Vulnerability Repair: How Far
are We?
- URL: http://arxiv.org/abs/2308.12533v2
- Date: Fri, 25 Aug 2023 01:34:33 GMT
- Title: Pre-trained Model-based Automated Software Vulnerability Repair: How Far
are We?
- Authors: Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang,
Zhenyu Chen
- Abstract summary: We show that studied pre-trained models consistently outperform the state-of-the-art technique VRepair with a prediction accuracy of 32.94%44.96%.
Surprisingly, a simplistic approach adopting transfer learning improves the prediction accuracy of pre-trained models by 9.40% on average.
Our study highlights the promising future of adopting pre-trained models to patch real-world vulnerabilities.
- Score: 14.741742268621403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various approaches are proposed to help under-resourced security researchers
to detect and analyze software vulnerabilities. It is still incredibly
time-consuming and labor-intensive for security researchers to fix
vulnerabilities. The time lag between reporting and fixing a vulnerability
causes software systems to suffer from significant exposure to possible
attacks. Recently, some techniques have proposed applying pre-trained models to
fix security vulnerabilities and have proved their success in improving repair
accuracy. However, the effectiveness of existing pre-trained models has not
been systematically analyzed, and little is known about their advantages and
disadvantages.
To bridge this gap, we perform the first extensive study on applying various
pre-trained models to vulnerability repair. The results show that studied
pre-trained models consistently outperform the state-of-the-art technique
VRepair with a prediction accuracy of 32.94%~44.96%. We also investigate the
impact of major phases in the vulnerability repair workflow. Surprisingly, a
simplistic approach adopting transfer learning improves the prediction accuracy
of pre-trained models by 9.40% on average. Besides, we provide additional
discussion to illustrate the capacity and limitations of pre-trained models.
Finally, we further pinpoint various practical guidelines for advancing
pre-trained model-based vulnerability repair. Our study highlights the
promising future of adopting pre-trained models to patch real-world
vulnerabilities.
Related papers
- Evaluating of Machine Unlearning: Robustness Verification Without Prior Modifications [15.257558809246524]
Unlearning is a process enabling pre-trained models to remove the influence of specific training samples.
Existing verification methods rely on machine learning attack techniques, such as membership inference attacks (MIAs) or backdoor attacks.
We propose a novel verification scheme without any prior modifications, and can support verification on a much larger set.
arXiv Detail & Related papers (2024-10-14T03:19:14Z) - Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation [4.374800396968465]
We propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection.
By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models, up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved.
arXiv Detail & Related papers (2024-09-30T21:44:05Z) - How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability Patching [14.794452134569475]
This paper compares existing solutions of self-supervised and supervised pre-training for vulnerability patching.
We found that a supervised pre-training focused on bug-fixing, while expensive in terms of data collection, substantially improves DL-based vulnerability patching.
When applying prompt-tuning on top of this supervised pre-trained model, there is no significant gain in performance.
arXiv Detail & Related papers (2024-04-27T13:08:42Z) - FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - Robustness-Congruent Adversarial Training for Secure Machine Learning
Model Updates [13.911586916369108]
We show that misclassifications in machine-learning models can affect robustness to adversarial examples.
We propose a technique, named robustness-congruent adversarial training, to address this issue.
We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators.
arXiv Detail & Related papers (2024-02-27T10:37:13Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Towards Certified Probabilistic Robustness with High Accuracy [3.957941698534126]
Adrial examples pose a security threat to many critical systems built on neural networks.
How to build certifiably robust yet accurate neural network models remains an open problem.
We propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness.
arXiv Detail & Related papers (2023-09-02T09:39:47Z) - Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective.
We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning [134.15174177472807]
We introduce adversarial training into self-supervision, to provide general-purpose robust pre-trained models for the first time.
We conduct extensive experiments to demonstrate that the proposed framework achieves large performance margins.
arXiv Detail & Related papers (2020-03-28T18:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.