When Machine Learning Meets Vulnerability Discovery: Challenges and Lessons Learned
- URL: http://arxiv.org/abs/2508.15042v1
- Date: Wed, 20 Aug 2025 20:09:49 GMT
- Title: When Machine Learning Meets Vulnerability Discovery: Challenges and Lessons Learned
- Authors: Sima Arasteh, Christophe Hauser,
- Abstract summary: In this paper, we explore the challenges of applying machine learning to vulnerability discovery.<n>First, researchers often fail to provide concrete statistics about their training datasets.<n> Secondly, the choice of a model and the level of granularity at which models are trained also affect the effectiveness of such vulnerability discovery approaches.
- Score: 3.000275719116454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, machine learning has demonstrated impressive results in various fields, including software vulnerability detection. Nonetheless, using machine learning to identify software vulnerabilities presents new challenges, especially regarding the scale of data involved, which was not a factor in traditional methods. Consequently, in spite of the rise of new machine-learning-based approaches in that space, important shortcomings persist regarding their evaluation. First, researchers often fail to provide concrete statistics about their training datasets, such as the number of samples for each type of vulnerability. Moreover, many methods rely on training with semantically similar functions rather than directly on vulnerable programs. This leads to uncertainty about the suitability of the datasets currently used for training. Secondly, the choice of a model and the level of granularity at which models are trained also affect the effectiveness of such vulnerability discovery approaches. In this paper, we explore the challenges of applying machine learning to vulnerability discovery. We also share insights from our two previous research papers, Bin2vec and BinHunter, which could enhance future research in this field.
Related papers
- RESTOR: Knowledge Recovery in Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can contain private or sensitive information.<n>Several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints.<n>We propose the RESTOR framework for machine unlearning evaluation.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - Instance-Level Difficulty: A Missing Perspective in Machine Unlearning [13.052520843129363]
We study the cruxes that make machine unlearning difficult through a thorough instance-level unlearning performance analysis.<n>In particular, we summarize four factors that make unlearning a data point difficult.<n>We argue that machine unlearning research should pay attention to the instance-level difficulty of unlearning.
arXiv Detail & Related papers (2024-10-03T23:41:42Z) - Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies.
This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z) - Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning [16.809644622465086]
We conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of unlearned data.
Under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample.
The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data.
arXiv Detail & Related papers (2024-04-04T06:37:46Z) - Interpretable Machine Learning for Discovery: Statistical Challenges \&
Opportunities [1.2891210250935146]
We discuss and review the field of interpretable machine learning.
We outline the types of discoveries that can be made using Interpretable Machine Learning.
We focus on the grand challenge of how to validate these discoveries in a data-driven manner.
arXiv Detail & Related papers (2023-08-02T23:57:31Z) - Class-wise Federated Unlearning: Harnessing Active Forgetting with Teacher-Student Memory Generation [11.638683787598817]
We propose a neuro-inspired federated unlearning framework based on active forgetting.<n>Our framework distinguishes itself from existing methods by utilizing new memories to overwrite old ones.<n>Our method achieves satisfactory unlearning completeness against backdoor attacks.
arXiv Detail & Related papers (2023-07-07T03:07:26Z) - Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective.
We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z) - Few-shot Weakly-supervised Cybersecurity Anomaly Detection [1.179179628317559]
We propose an enhancement to an existing few-shot weakly-supervised deep learning anomaly detection framework.
This framework incorporates data augmentation, representation learning and ordinal regression.
We then evaluated and showed the performance of our implemented framework on three benchmark datasets.
arXiv Detail & Related papers (2023-04-15T04:37:54Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Machine Learning Security against Data Poisoning: Are We There Yet? [23.809841593870757]
This article reviews data poisoning attacks that compromise the training data used to learn machine learning models.
We discuss how to mitigate these attacks using basic security principles, or by deploying ML-oriented defensive mechanisms.
arXiv Detail & Related papers (2022-04-12T17:52:09Z) - Knowledge as Invariance -- History and Perspectives of
Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point.
Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks.
This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.