Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open & Industry Data
- URL: http://arxiv.org/abs/2509.09313v1
- Date: Thu, 11 Sep 2025 09:58:43 GMT
- Title: Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open & Industry Data
- Authors: Moritz Mock, Thomas Forrer, Barbara Russo,
- Abstract summary: Transferring such technologies from academia to industry presents challenges related to trustworthiness, legacy systems, limited digital literacy, and the gap between academic and industrial expertise.<n>We first evaluate the performance of CodeBERT for detecting vulnerable functions in industrial and open-source software.<n>We develop AI-DO( on vulnerability detection Integration for Developers' Operations), a Continuous Integration-Continuous Deployment (CI/CD)-integrated recommender system.
- Score: 0.42752082548275155
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep learning solutions for vulnerability detection proposed in academic research are not always accessible to developers, and their applicability in industrial settings is rarely addressed. Transferring such technologies from academia to industry presents challenges related to trustworthiness, legacy systems, limited digital literacy, and the gap between academic and industrial expertise. For deep learning in particular, performance and integration into existing workflows are additional concerns. In this work, we first evaluate the performance of CodeBERT for detecting vulnerable functions in industrial and open-source software. We analyse its cross-domain generalisation when fine-tuned on open-source data and tested on industrial data, and vice versa, also exploring strategies for handling class imbalance. Based on these results, we develop AI-DO(Automating vulnerability detection Integration for Developers' Operations), a Continuous Integration-Continuous Deployment (CI/CD)-integrated recommender system that uses fine-tuned CodeBERT to detect and localise vulnerabilities during code review without disrupting workflows. Finally, we assess the tool's perceived usefulness through a survey with the company's IT professionals. Our results show that models trained on industrial data detect vulnerabilities accurately within the same domain but lose performance on open-source code, while a deep learner fine-tuned on open data, with appropriate undersampling techniques, improves the detection of vulnerabilities.
Related papers
- Automated Vulnerability Validation and Verification: A Large Language Model Approach [7.482522010482827]
This paper introduces an end-to-end multi-step pipeline leveraging generative AI, specifically large language models (LLMs)<n>Our approach extracts information from CVE disclosures in the National Vulnerability Database.<n>It augments it with external public knowledge (e.g., threat advisories, code snippets) using Retrieval-Augmented Generation (RAG)<n>The pipeline iteratively refines generated artifacts, validates attack success with test cases, and supports complex multi-container setups.
arXiv Detail & Related papers (2025-09-28T19:16:12Z) - Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z) - AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science [44.18533574465929]
We introduce AssistedDS, a benchmark designed to evaluate how large language models handle domain knowledge.<n>We assess state-of-the-art LLMs on their ability to discern and apply beneficial versus harmful domain knowledge.<n>Our results demonstrate a substantial gap in current models' ability to critically evaluate and leverage expert knowledge.
arXiv Detail & Related papers (2025-05-25T05:50:21Z) - Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection [9.652886240532741]
This paper thoroughly analyses large language models' capabilities in detecting vulnerabilities within source code.
We evaluate the performance of six open-source models that are specifically trained for vulnerability detection against six general-purpose LLMs.
arXiv Detail & Related papers (2024-08-29T10:00:57Z) - Systematic review and characterisation of malicious industrial network traffic datasets [0.0]
This paper systematically reviews publicly available network traffic capture-based datasets.<n>It includes categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis.<n>It provides researchers with metadata that can be used to select the best dataset for their research question.
arXiv Detail & Related papers (2024-05-08T07:48:40Z) - Bridging the Gap: A Study of AI-based Vulnerability Management between Industry and Academia [4.4037442949276455]
Recent research advances in Artificial Intelligence (AI) have yielded promising results for automated software vulnerability management.
The industry remains very cautious and selective about integrating AI-based techniques into their security vulnerability management workflow.
We propose a set of future directions to help better understand industry expectations, improve the practical usability of AI-based security vulnerability research, and drive a synergistic relationship between industry and academia.
arXiv Detail & Related papers (2024-05-03T19:00:50Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Causative Insights into Open Source Software Security using Large
Language Code Embeddings and Semantic Vulnerability Graph [3.623199159688412]
Open Source Software (OSS) vulnerabilities can cause unauthorized access, data breaches, network disruptions, and privacy violations.
Recent deep-learning techniques have shown great promise in identifying and localizing vulnerabilities in source code.
Our study shows a 24% improvement in code repair capabilities compared to previous methods.
arXiv Detail & Related papers (2024-01-13T10:33:22Z) - Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected.
However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data.
This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.