VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software
- URL: http://arxiv.org/abs/2506.18050v2
- Date: Tue, 24 Jun 2025 07:11:45 GMT
- Title: VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software
- Authors: Lyuye Zhang, Jian Zhang, Kaixuan Li, Chong Wang, Chengwei Liu, Jiahui Wu, Sen Chen, Yaowen Zheng, Yang Liu,
- Abstract summary: Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies.<n>We present VFArch=e, a dual-mode approach designed for disclosed vulnerabilities, applicable in scenarios with or without available patch links.<n>We successfully locate VF for 43 out of 50 latest vulnerabilities with reasonable efforts and significantly reducing 78-89% false positives of SCA tools.
- Score: 15.634212183627236
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies. In particular, reachability analysis is increasingly used in Open-Source Software (OSS) projects to identify reachable vulnerabilities (e.g., CVEs) through call graphs, enabling a focus on exploitable risks. Performing reachability analysis typically requires the vulnerable function (VF) to track the call chains from downstream applications. However, such crucial information is usually unavailable in modern vulnerability databases like NVD. While directly extracting VF from modified functions in vulnerability patches is intuitive, patches are not always available. Moreover, our preliminary study shows that over 26% of VF do not exist in the modified functions. Meanwhile, simply ignoring patches to search vulnerable functions suffers from overwhelming noises and lexical gaps between descriptions and source code. Given that almost half of the vulnerabilities are equipped with patches, a holistic solution that handles both scenarios with and without patches is required. To meet real-world needs and automatically localize VF, we present VFArch\=e, a dual-mode approach designed for disclosed vulnerabilities, applicable in scenarios with or without available patch links. The experimental results of VFArch\=e on our constructed benchmark dataset demonstrate significant efficacy regarding three metrics, achieving 1.3x and 1.9x Mean Reciprocal Rank over the best baselines for Patch-present and Patch-absent modes, respectively. Moreover, VFArch\=e has proven its applicability in real-world scenarios by successfully locating VF for 43 out of 50 latest vulnerabilities with reasonable efforts and significantly reducing 78-89% false positives of SCA tools.
Related papers
- CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization [45.97834622654751]
BadVLA is a backdoor attack method based on Objective-Decoupled Optimization.<n>We show that BadVLA consistently achieves near-100% attack success rates with minimal impact on clean task accuracy.<n>Our work offers the first systematic investigation of backdoor vulnerabilities in VLA models.
arXiv Detail & Related papers (2025-05-22T13:12:46Z) - Fixseeker: An Empirical Driven Graph-based Approach for Detecting Silent Vulnerability Fixes in Open Source Software [12.706661324384319]
Open source software vulnerabilities pose significant security risks to downstream applications.<n>Many security patches are released silently in new commits of OSS repositories without explicit indications of their security impact.<n>We propose Fixseeker, a graph-based approach that extracts the various correlations between code changes at the hunk level to detect silent vulnerability fixes.
arXiv Detail & Related papers (2025-03-26T06:16:58Z) - EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z) - Fine-Grained 1-Day Vulnerability Detection in Binaries via Patch Code Localization [12.73365645156957]
1-day vulnerabilities in binaries have become a major threat to software security.<n>patch presence test is one of the effective ways to detect the vulnerability.<n>We propose a novel approach named PLocator, which leverages stable values from both the patch code and its context.
arXiv Detail & Related papers (2025-01-29T04:35:37Z) - Vulnerability Detection with Code Language Models: How Far Are We? [40.455600722638906]
PrimeVul is a new dataset for training and evaluating code LMs for vulnerability detection.
It incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks.
It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues.
arXiv Detail & Related papers (2024-03-27T14:34:29Z) - ReposVul: A Repository-Level High-Quality Vulnerability Dataset [13.90550557801464]
We propose an automated data collection framework and construct the first repository-level high-quality vulnerability dataset named ReposVul.
The proposed framework mainly contains three modules: (1) A vulnerability untangling module, aiming at distinguishing vulnerability-fixing related code changes from tangled patches, in which the Large Language Models (LLMs) and static analysis tools are jointly employed, (2) A multi-granularity dependency extraction module, aiming at capturing the inter-procedural call relationships of vulnerabilities, in which we construct multiple-granularity information for each vulnerability patch, including repository-level, file-level, function-level
arXiv Detail & Related papers (2024-01-24T01:27:48Z) - SliceLocator: Locating Vulnerable Statements with Graph-based Detectors [33.395068754566935]
SliceLocator identifies the most relevant taint flow by selecting the highest-weighted flow path from all potential vulnerability-triggering statements.<n>We demonstrate that SliceLocator consistently performs well on four state-of-the-art GNN-based vulnerability detectors.
arXiv Detail & Related papers (2024-01-05T10:15:04Z) - Just-in-Time Detection of Silent Security Patches [7.840762542485285]
Security patches can be em silent, i.e., they do not always come with comprehensive advisories such as CVEs.
This lack of transparency leaves users oblivious to available security updates, providing ample opportunity for attackers to exploit unpatched vulnerabilities.
We propose to leverage large language models (LLMs) to augment patch information with generated code change explanations.
arXiv Detail & Related papers (2023-12-02T22:53:26Z) - VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix
Identification [4.837912059099674]
VFFINDER is a graph-based approach for automated silent vulnerability fix identification.
It distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models.
Our results show that VFFINDER significantly improves the state-of-the-art methods by 39-83% in Precision, 19-148% in Recall, and 30-109% in F1.
arXiv Detail & Related papers (2023-09-05T05:55:18Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Autosploit: A Fully Automated Framework for Evaluating the
Exploitability of Security Vulnerabilities [47.748732208602355]
Autosploit is an automated framework for evaluating the exploitability of vulnerabilities.
It automatically tests the exploits on different configurations of the environment.
It is able to identify the system properties that affect the ability to exploit a vulnerability in both noiseless and noisy environments.
arXiv Detail & Related papers (2020-06-30T18:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.