Related papers: An Empirical Study of Vulnerability Detection using Federated Learning

An Empirical Study of Vulnerability Detection using Federated Learning

URL: http://arxiv.org/abs/2411.16099v1
Date: Mon, 25 Nov 2024 05:21:12 GMT
Title: An Empirical Study of Vulnerability Detection using Federated Learning
Authors: Peiheng Zhou, Ming Hu, Xingrun Quan, Yawen Peng, Xiaofei Xie, Yanxin Yang, Chengwei Liu, Yueming Wu, Mingsong Chen,
Abstract summary: Federated Learning (FL) has been investigated as a promising means of addressing the data silo problem in vulnerability detection. This paper first proposes VulFL, an effective evaluation framework for FL-based vulnerability detection. Our study sheds light on the potential of FL in vulnerability detection, which can be used to guide the design of FL-based solutions for vulnerability detection.
Score: 19.14259520825083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Deep Learning (DL) methods becoming increasingly popular in vulnerability detection, their performance is seriously limited by insufficient training data. This is mainly because few existing software organizations can maintain a complete set of high-quality samples for DL-based vulnerability detection. Due to the concerns about privacy leakage, most of them are reluctant to share data, resulting in the data silo problem. Since enables collaboratively model training without data sharing, Federated Learning (FL) has been investigated as a promising means of addressing the data silo problem in DL-based vulnerability detection. However, since existing FL-based vulnerability detection methods focus on specific applications, it is still far unclear i) how well FL adapts to common vulnerability detection tasks and ii) how to design a high-performance FL solution for a specific vulnerability detection task. To answer these two questions, this paper first proposes VulFL, an effective evaluation framework for FL-based vulnerability detection. Then, based on VulFL, this paper conducts a comprehensive study to reveal the underlying capabilities of FL in dealing with different types of CWEs, especially when facing various data heterogeneity scenarios. Our experimental results show that, compared to independent training, FL can significantly improve the detection performance of common AI models on all investigated CWEs, though the performance of FL-based vulnerability detection is limited by heterogeneous data. To highlight the performance differences between different FL solutions for vulnerability detection, we extensively investigate the impacts of different configuration strategies for each framework component of VulFL. Our study sheds light on the potential of FL in vulnerability detection, which can be used to guide the design of FL-based solutions for vulnerability detection.

Related papers

A Thorough Assessment of the Non-IID Data Impact in Federated Learning [1.89791888476598]
Decentralized machine learning (FL) allows collaborative machine learning (ML) training among decentralized clients' information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (ak.a. non-IIDness) remain scarce. We aim to fill this gap by assessing and quantifying the non-IID effect through a thorough empirical analysis.
arXiv Detail & Related papers (2025-03-21T11:53:36Z)
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing. While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z)
Federated Learning with Anomaly Detection via Gradient and Reconstruction Analysis [2.28438857884398]
We introduce a novel framework that synergizes gradient-based analysis with autoencoder-driven data reconstruction to detect and mitigate poisoned data with unprecedented precision. Our method outperforms existing solutions by 15% in anomaly detection accuracy while maintaining a minimal false positive rate. Our work paves the way for future advancements in distributed learning security.
arXiv Detail & Related papers (2024-03-15T03:54:45Z)
How Far Have We Gone in Vulnerability Detection Using Large Language Models [15.09461331135668]
We introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF challenges and real-world applications. We find that several LLMs outperform traditional deep learning approaches in vulnerability detection.
arXiv Detail & Related papers (2023-11-21T08:20:39Z)
Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS) In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems. Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z)
Enabling Quartile-based Estimated-Mean Gradient Aggregation As Baseline for Federated Image Classifications [5.5099914877576985]
Federated Learning (FL) has revolutionized how we train deep neural networks by enabling decentralized collaboration while safeguarding sensitive data and improving model performance. This paper introduces an innovative solution named Estimated Mean Aggregation (EMA) that not only addresses these challenges but also provides a fundamental reference point as a $mathsfbaseline$ for advanced aggregation techniques in FL systems.
arXiv Detail & Related papers (2023-09-21T17:17:28Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
OLIVE: Oblivious Federated Learning on Trusted Execution Environment against the risk of sparsification [22.579050671255846]
This study focuses on the analysis of the vulnerabilities of server-side TEEs in Federated Learning and the defense. First, we theoretically analyze the leakage of memory access patterns, revealing the risk of sparsified gradients. Second, we devise an inference attack to link memory access patterns to sensitive information in the training dataset.
arXiv Detail & Related papers (2022-02-15T03:23:57Z)
Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z)
Towards Understanding Quality Challenges of the Federated Learning: A First Look from the Lens of Robustness [4.822471415125479]
Federated learning (FL) aims to preserve users' data privacy while leveraging the entire dataset of all participants for training. FL still tends to suffer from quality issues such as attacks or byzantine faults. This paper investigates the effectiveness of state-of-the-art (SOTA) robust FL techniques in the presence of attacks and faults.
arXiv Detail & Related papers (2022-01-05T02:06:39Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.