SAGA: Detecting Security Vulnerabilities Using Static Aspect Analysis
- URL: http://arxiv.org/abs/2601.15154v1
- Date: Wed, 21 Jan 2026 16:26:26 GMT
- Title: SAGA: Detecting Security Vulnerabilities Using Static Aspect Analysis
- Authors: Yoann Marquer, Domenico Bianculli, Lionel C. Briand,
- Abstract summary: SAGA is an approach to detect and locate vulnerabilities in Python source code in a versatile way.<n>We have evaluated SAGA on a dataset of 108 vulnerabilities, obtaining 100% sensitivity and 99.15% specificity, with only one false positive, while outperforming four common security analysis tools.
- Score: 5.971445533193919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Python is one of the most popular programming languages; as such, projects written in Python involve an increasing number of diverse security vulnerabilities. However, existing state-of-the-art analysis tools for Python only support a few vulnerability types. Hence, there is a need to detect a large variety of vulnerabilities in Python projects. In this paper, we propose the SAGA approach to detect and locate vulnerabilities in Python source code in a versatile way. SAGA includes a source code parser able to extract control- and data-flow information and to represent it as a symbolic control-flow graph, as well as a domain-specific language defining static aspects of the source code and their evolution during graph traversals. We have leveraged this language to define a library of static aspects for integrity, confidentiality, and other security-related properties. We have evaluated SAGA on a dataset of 108 vulnerabilities, obtaining 100% sensitivity and 99.15% specificity, with only one false positive, while outperforming four common security analysis tools. This analysis was performed in less than 31 seconds, i.e., between 2.5 and 512.1 times faster than the baseline tools.
Related papers
- Multi-Agent Taint Specification Extraction for Vulnerability Detection [49.27772068704498]
Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results.<n>We present SemTaint, a multi-agent system that strategically combines the semantic understanding of Large Language Models (LLMs) with traditional static program analysis.<n>We integrate SemTaint with CodeQL, a state-of-the-art SAST tool, and demonstrate its effectiveness by detecting 106 of 162 vulnerabilities previously undetectable by CodeQL.
arXiv Detail & Related papers (2026-01-15T21:31:51Z) - Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories [0.0]
This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories.<n>We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT, GitHub Copilot, Amazon CodeWhisperer, and Tabnine.<n>Using CodeQL static analysis, we identified 4,241 Common Weaknession (CWE) instances across 77 distinct vulnerability types.
arXiv Detail & Related papers (2025-10-30T03:29:06Z) - What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs [46.325755802511026]
We developLM, a dual-method pipeline that integrates two approaches based on a Large Language Model (LLM) and a fine-tuned small language model.<n>LM successfully identified 111 of 5,140 recent Linux kernel patches addressing OOB or UAF vulnerabilities, with 90 true positives confirmed by manual verification.
arXiv Detail & Related papers (2025-09-26T18:06:36Z) - An Empirical Study of Vulnerabilities in Python Packages and Their Detection [12.629138654621983]
This paper introduces PyVul, the first comprehensive benchmark suite of Python-package vulnerabilities.<n>PyVul includes 1,157 publicly reported, developer-verified vulnerabilities, each linked to its affected packages.<n>An LLM-assisted data cleansing method is incorporated to improve label accuracy, achieving 100% commit-level and 94% function-level accuracy.
arXiv Detail & Related papers (2025-09-04T14:38:28Z) - Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z) - An Empirical Study of Vulnerability Handling Times in CPython [0.2538209532048867]
The paper examines the handling times of software vulnerabilities in CPython.<n>The paper contributes to the recent effort to better understand security of the Python ecosystem.
arXiv Detail & Related papers (2024-11-01T08:46:14Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.1875460416205]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.<n>It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.<n>Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - Machine Learning Techniques for Python Source Code Vulnerability Detection [0.0]
We apply and compare different machine learning algorithms for source code vulnerability detection specifically for Python programming language.
Our Bidirectional Long Short-Term Memory (BiLSTM) model achieves a remarkable performance.
arXiv Detail & Related papers (2024-04-15T08:01:02Z) - Exploring Security Commits in Python [11.533638656389137]
Most security issues in Python have not been indexed by CVE and may only be fixed by'silent' security commits.
It is critical to identify the hidden security commits, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features.
We construct the first security commit dataset in Python, PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset.
arXiv Detail & Related papers (2023-07-21T18:46:45Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.