Related papers: A Match Made in Heaven? Matching Test Cases and Vulnerabilities With the VUTECO Approach

A Match Made in Heaven? Matching Test Cases and Vulnerabilities With the VUTECO Approach

URL: http://arxiv.org/abs/2502.03365v1
Date: Wed, 05 Feb 2025 17:02:42 GMT
Title: A Match Made in Heaven? Matching Test Cases and Vulnerabilities With the VUTECO Approach
Authors: Emanuele Iannone, Quang-Cuong Bui, Riccardo Scandariato,
Abstract summary: This paper introduces VUTECO, a deep learning-based approach for collecting instances of vulnerability-witnessing tests from Java repositories.<n>VUTECO successfully addresses the Finding task, achieving perfect precision and 0.83 F0.5 score on validated test cases in VUL4J.<n>Despite showing sufficiently good performance for the Matching task, VUTECO failed to retrieve any valid match in the wild.
Score: 4.8556535196652195
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Software vulnerabilities are commonly detected via static analysis, penetration testing, and fuzzing. They can also be found by running unit tests - so-called vulnerability-witnessing tests - that stimulate the security-sensitive behavior with crafted inputs. Developing such tests is difficult and time-consuming; thus, automated data-driven approaches could help developers intercept vulnerabilities earlier. However, training and validating such approaches require a lot of data, which is currently scarce. This paper introduces VUTECO, a deep learning-based approach for collecting instances of vulnerability-witnessing tests from Java repositories. VUTECO carries out two tasks: (1) the "Finding" task to determine whether a test case is security-related, and (2) the "Matching" task to relate a test case to the exact vulnerability it is witnessing. VUTECO successfully addresses the Finding task, achieving perfect precision and 0.83 F0.5 score on validated test cases in VUL4J and returning 102 out of 145 (70%) correct security-related test cases from 244 open-source Java projects. Despite showing sufficiently good performance for the Matching task - i.e., 0.86 precision and 0.68 F0.5 score - VUTECO failed to retrieve any valid match in the wild. Nevertheless, we observed that in almost all of the matches, the test case was still security-related despite being matched to the wrong vulnerability. In the end, VUTECO can help find vulnerability-witnessing tests, though the matching with the right vulnerability is yet to be solved; the findings obtained lay the stepping stone for future research on the matter.

Related papers

An Automated Blackbox Noncompliance Checker for QUIC Server Implementations [2.9248916859490173]
QUICtester is an automated approach for uncovering non-compliant behaviors in the ratified QUIC protocol implementations (RFC 9000/).<n>We used QUICtester to analyze 186 learned models from 19 QUIC implementations under the five security settings and discovered 55 implementation errors.
arXiv Detail & Related papers (2025-05-19T04:28:49Z)
Are Autonomous Web Agents Good Testers? [41.56233403862961]
Large Language Models (LLMs) offer a potential alternative by powering Autonomous Web Agents (AWAs) AWAs may serve as Autonomous Test Agents (ATAs) This paper investigates the feasibility of adapting AWAs for natural language test case execution.
arXiv Detail & Related papers (2025-04-02T08:48:01Z)
Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We? [14.974832502863526]
In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. In this paper, we propose an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts.
arXiv Detail & Related papers (2024-04-28T13:40:18Z)
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments [52.65013932553849]
Good detectors tend to output bounding boxes whose locations do not change much, while bounding boxes of poor detectors will undergo noticeable position changes. We compute the box stability score (BoS score) to reflect this stability. We contribute to finding that BoS score has a strong, positive correlation with detection accuracy measured by mean average precision (mAP) under various test environments.
arXiv Detail & Related papers (2024-03-20T17:59:16Z)
Efficiently Detecting Reentrancy Vulnerabilities in Complex Smart Contracts [35.26195628798847]
Existing vulnerability detection tools perform poorly in terms of efficiency and successful detection rates for vulnerabilities in complex contracts. SliSE provides a robust and efficient method for detection of Reentrancy vulnerabilities for complex contracts.
arXiv Detail & Related papers (2024-03-17T16:08:30Z)
AIM: Automated Input Set Minimization for Metamorphic Security Testing [9.232277700524786]
We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black-box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm to efficiently select diverse inputs while minimizing their total cost.
arXiv Detail & Related papers (2024-02-16T15:54:58Z)
Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process. We present TaRGET, a novel approach leveraging pre-trained code language models for automated test case repair. TaRGET treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z)
Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports [8.85274953789614]
name is a technique for exploring bug reports to identify input values that can be fed to automatic test generation tools. For Defects4J projects, our study has shown that name successfully extracted 68.68% of relevant inputs when using regular expression in its approach.
arXiv Detail & Related papers (2023-12-22T18:19:33Z)
Towards single integrated spoofing-aware speaker verification embeddings [63.42889348690095]
This study aims to develop a single integrated spoofing-aware speaker verification embeddings. We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data. Experiments show dramatic improvements, achieving a SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.
arXiv Detail & Related papers (2023-05-30T14:15:39Z)
AUTO: Adaptive Outlier Optimization for Online Test-Time OOD Detection [81.49353397201887]
Out-of-distribution (OOD) detection is crucial to deploying machine learning models in open-world applications. We introduce a novel paradigm called test-time OOD detection, which utilizes unlabeled online data directly at test time to improve OOD detection performance. We propose adaptive outlier optimization (AUTO), which consists of an in-out-aware filter, an ID memory bank, and a semantically-consistent objective.
arXiv Detail & Related papers (2023-03-22T02:28:54Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)
Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles [38.23896575179384]
We propose a principled and practically effective framework that simultaneously addresses the two tasks. One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%. On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
arXiv Detail & Related papers (2021-06-29T21:32:51Z)
Autosploit: A Fully Automated Framework for Evaluating the Exploitability of Security Vulnerabilities [47.748732208602355]
Autosploit is an automated framework for evaluating the exploitability of vulnerabilities. It automatically tests the exploits on different configurations of the environment. It is able to identify the system properties that affect the ability to exploit a vulnerability in both noiseless and noisy environments.
arXiv Detail & Related papers (2020-06-30T18:49:18Z)
Detection of Coincidentally Correct Test Cases through Random Forests [1.2891210250935143]
We propose a hybrid approach of ensemble learning combined with a supervised learning algorithm namely, Random Forests (RF) for the purpose of correctly identifying test cases that are mislabeled to be the passing test cases. A cost-effective analysis of flipping the test status or trimming (i.e., eliminating from the computation) the coincidental correct test cases is also reported.
arXiv Detail & Related papers (2020-06-14T15:01:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.