Related papers: Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

URL: http://arxiv.org/abs/2602.12500v1
Date: Fri, 13 Feb 2026 00:51:22 GMT
Title: Favia: Forensic Agent for Vulnerability-fix Identification and Analysis
Authors: André Storhaug, Jiamou Sun, Jingyue Li,
Abstract summary: We propose Favia, a forensic, agent-based framework for vulnerability-fix identification.<n>Favia combines scalable candidate ranking with deep and iterative semantic reasoning.<n>We evaluate Favia on CVEVC, a large-scale dataset we made that comprises over 8 million commits from 3,708 real-world repositories.
Score: 5.43098755190303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at scale, as large repositories contain millions of commits of which only a small fraction address security issues. Existing automated approaches, including traditional machine learning techniques and recent large language model (LLM)-based methods, often suffer from poor precision-recall trade-offs. Frequently evaluated on randomly sampled commits, we uncover that they are substantially underestimating real-world difficulty, where candidate commits are already security-relevant and highly similar. We propose Favia, a forensic, agent-based framework for vulnerability-fix identification that combines scalable candidate ranking with deep and iterative semantic reasoning. Favia first employs an efficient ranking stage to narrow the search space of commits. Each commit is then rigorously evaluated using a ReAct-based LLM agent. By providing the agent with a pre-commit repository as environment, along with specialized tools, the agent tries to localize vulnerable components, navigates the codebase, and establishes causal alignment between code changes and vulnerability root causes. This evidence-driven process enables robust identification of indirect, multi-file, and non-trivial fixes that elude single-pass or similarity-based methods. We evaluate Favia on CVEVC, a large-scale dataset we made that comprises over 8 million commits from 3,708 real-world repositories, and show that it consistently outperforms state-of-the-art traditional and LLM-based baselines under realistic candidate selection, achieving the strongest precision-recall trade-offs and highest F1-scores.

Related papers

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z)
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack [52.17935054046577]
We present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks.<n>ReasAlign incorporates structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks.
arXiv Detail & Related papers (2026-01-15T08:23:38Z)
Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank [0.0]
We present SiftRank, a ranking algorithm achieving O(n) complexity through three key mechanisms.<n>SiftRank operates directly on thousands of items, with each document evaluated across multiple randomized batches to mitigate inconsistent judgments.<n>We demonstrate practical effectiveness on N-day vulnerability analysis, successfully identifying a vulnerability-fixing function among 2,197 changed functions in a stripped binary firmware patch within 99 seconds at an inference cost of $0.82.
arXiv Detail & Related papers (2025-12-05T21:09:32Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search [69.60882125603133]
We present ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents.<n>Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.
arXiv Detail & Related papers (2025-09-27T22:36:42Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale [45.97598662617568]
We introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects.<n>We show that CyberGym leads to the discovery of 35 zero-day vulnerabilities and 17 historically incomplete patches.<n>These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
Fast and Accurate Silent Vulnerability Fix Retrieval [7.512949497610182]
Existing approaches to trace/retrieve the patching commit for fixing a CVE suffer from two major challenges.<n>We propose SITPatchTracer, a scalable and effective retrieval system for tracing known vulnerability patches.<n>Using SITPatchTracer, we have successfully traced and merged the patch links for 35 new CVEs in the GitHub Advisory database.
arXiv Detail & Related papers (2025-03-29T01:53:07Z)
Reasoning with LLMs for Zero-Shot Vulnerability Detection [0.9208007322096533]
We present textbfVulnSage, a comprehensive evaluation framework and a curated dataset from diverse, large-scale open-source system software projects.<n>The framework supports multi-granular analysis across function, file, and inter-function levels.<n>It employs four diverse zero-shot prompt strategies: Baseline, Chain-of-context, Think, and Think & verify.
arXiv Detail & Related papers (2025-03-22T23:59:17Z)
Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z)
Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories [7.629717457706326]
We present an approach that combines practical experience and machine-learning (ML) An advisory record containing key information about a vulnerability is extracted from an advisory. A subset of candidate fix commits is obtained from the source code repository of the affected project.
arXiv Detail & Related papers (2021-03-24T17:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.