An Empirical Study of Vulnerability Detection using Federated Learning
- URL: http://arxiv.org/abs/2411.16099v1
- Date: Mon, 25 Nov 2024 05:21:12 GMT
- Title: An Empirical Study of Vulnerability Detection using Federated Learning
- Authors: Peiheng Zhou, Ming Hu, Xingrun Quan, Yawen Peng, Xiaofei Xie, Yanxin Yang, Chengwei Liu, Yueming Wu, Mingsong Chen,
- Abstract summary: Federated Learning (FL) has been investigated as a promising means of addressing the data silo problem in vulnerability detection.
This paper first proposes VulFL, an effective evaluation framework for FL-based vulnerability detection.
Our study sheds light on the potential of FL in vulnerability detection, which can be used to guide the design of FL-based solutions for vulnerability detection.
- Score: 19.14259520825083
- License:
- Abstract: Although Deep Learning (DL) methods becoming increasingly popular in vulnerability detection, their performance is seriously limited by insufficient training data. This is mainly because few existing software organizations can maintain a complete set of high-quality samples for DL-based vulnerability detection. Due to the concerns about privacy leakage, most of them are reluctant to share data, resulting in the data silo problem. Since enables collaboratively model training without data sharing, Federated Learning (FL) has been investigated as a promising means of addressing the data silo problem in DL-based vulnerability detection. However, since existing FL-based vulnerability detection methods focus on specific applications, it is still far unclear i) how well FL adapts to common vulnerability detection tasks and ii) how to design a high-performance FL solution for a specific vulnerability detection task. To answer these two questions, this paper first proposes VulFL, an effective evaluation framework for FL-based vulnerability detection. Then, based on VulFL, this paper conducts a comprehensive study to reveal the underlying capabilities of FL in dealing with different types of CWEs, especially when facing various data heterogeneity scenarios. Our experimental results show that, compared to independent training, FL can significantly improve the detection performance of common AI models on all investigated CWEs, though the performance of FL-based vulnerability detection is limited by heterogeneous data. To highlight the performance differences between different FL solutions for vulnerability detection, we extensively investigate the impacts of different configuration strategies for each framework component of VulFL. Our study sheds light on the potential of FL in vulnerability detection, which can be used to guide the design of FL-based solutions for vulnerability detection.
Related papers
- Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing.
While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z) - Federated Learning with Anomaly Detection via Gradient and Reconstruction Analysis [2.28438857884398]
We introduce a novel framework that synergizes gradient-based analysis with autoencoder-driven data reconstruction to detect and mitigate poisoned data with unprecedented precision.
Our method outperforms existing solutions by 15% in anomaly detection accuracy while maintaining a minimal false positive rate.
Our work paves the way for future advancements in distributed learning security.
arXiv Detail & Related papers (2024-03-15T03:54:45Z) - How Far Have We Gone in Vulnerability Detection Using Large Language
Models [15.09461331135668]
We introduce a comprehensive vulnerability benchmark VulBench.
This benchmark aggregates high-quality data from a wide range of CTF challenges and real-world applications.
We find that several LLMs outperform traditional deep learning approaches in vulnerability detection.
arXiv Detail & Related papers (2023-11-21T08:20:39Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - Enabling Quartile-based Estimated-Mean Gradient Aggregation As Baseline
for Federated Image Classifications [5.5099914877576985]
Federated Learning (FL) has revolutionized how we train deep neural networks by enabling decentralized collaboration while safeguarding sensitive data and improving model performance.
This paper introduces an innovative solution named Estimated Mean Aggregation (EMA) that not only addresses these challenges but also provides a fundamental reference point as a $mathsfbaseline$ for advanced aggregation techniques in FL systems.
arXiv Detail & Related papers (2023-09-21T17:17:28Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - OLIVE: Oblivious Federated Learning on Trusted Execution Environment
against the risk of sparsification [22.579050671255846]
This study focuses on the analysis of the vulnerabilities of server-side TEEs in Federated Learning and the defense.
First, we theoretically analyze the leakage of memory access patterns, revealing the risk of sparsified gradients.
Second, we devise an inference attack to link memory access patterns to sensitive information in the training dataset.
arXiv Detail & Related papers (2022-02-15T03:23:57Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Towards Understanding Quality Challenges of the Federated Learning: A
First Look from the Lens of Robustness [4.822471415125479]
Federated learning (FL) aims to preserve users' data privacy while leveraging the entire dataset of all participants for training.
FL still tends to suffer from quality issues such as attacks or byzantine faults.
This paper investigates the effectiveness of state-of-the-art (SOTA) robust FL techniques in the presence of attacks and faults.
arXiv Detail & Related papers (2022-01-05T02:06:39Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.