Patch2QL: Discover Cognate Defects in Open Source Software Supply Chain
With Auto-generated Static Analysis Rules
- URL: http://arxiv.org/abs/2401.12443v2
- Date: Tue, 30 Jan 2024 02:23:12 GMT
- Title: Patch2QL: Discover Cognate Defects in Open Source Software Supply Chain
With Auto-generated Static Analysis Rules
- Authors: Fuwei Wang, Yongzhi Liu, Zhiqiang Dong
- Abstract summary: We propose a novel technique for detecting cognate defects in OSS through the automatic generation of SAST rules.
Specifically, it extracts key syntax and semantic information from pre- and post-patch versions of code.
We have implemented a prototype tool called Patch2QL and applied it to fundamental OSS in C/C++.
- Score: 1.9591497166224197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the open source software (OSS) ecosystem, there exists a complex software
supply chain, where developers upstream and downstream widely borrow and reuse
code. This results in the widespread occurrence of recurring defects, missing
fixes, and propagation issues. These are collectively referred to as cognate
defects, and their scale and threats have not received extensive attention and
systematic research. Software composition analysis and code clone detection
methods are unable to cover the various variant issues in the supply chain
scenario, while code static analysis, or static application security testing
(SAST) techniques struggle to target specific defects. In this paper, we
propose a novel technique for detecting cognate defects in OSS through the
automatic generation of SAST rules. Specifically, it extracts key syntax and
semantic information from pre- and post-patch versions of code through
structural comparison and control flow to data flow analysis, and generates
rules that matches these key elements. We have implemented a prototype tool
called Patch2QL and applied it to fundamental OSS in C/C++. In experiments, we
discovered 7 new vulnerabilities with medium to critical severity in the most
popular upstream software, as well as numerous potential security issues. When
analyzing downstream projects in the supply chain, we found a significant
number of representative cognate defects, clarifying the threat posed by this
issue. Additionally, compared to general-purpose SAST and signature-based
mechanisms, the generated rules perform better at discover all variants of
cognate defects.
Related papers
- A Combined Feature Embedding Tools for Multi-Class Software Defect and Identification [2.2020053359163305]
We present CodeGraphNet, an experimental method that combines GraphCodeBERT and Graph Convolutional Network approaches.
This method captures intricate relation- ships between features, providing for more exact identification and separation of vulnerabilities.
The DeepTree model, which is a hybrid of a Decision Tree and a Neural Network, outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2024-11-26T17:33:02Z) - LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [62.12404317786005]
EvoCoder is a continuous learning framework for issue code reproduction.
Our results show a 20% improvement in issue reproduction rates over existing SOTA methods.
arXiv Detail & Related papers (2024-11-21T08:49:23Z) - Fixing Security Vulnerabilities with AI in OSS-Fuzz [9.730566646484304]
OSS-Fuzz is the most significant and widely used infrastructure for continuous validation of open source systems.
We customise the well-known AutoCodeRover agent for fixing security vulnerabilities.
Our experience with OSS-Fuzz vulnerability data shows that LLM agent autonomy is useful for successful security patching.
arXiv Detail & Related papers (2024-11-03T16:20:32Z) - Divide and Conquer based Symbolic Vulnerability Detection [0.16385815610837165]
This paper presents a vulnerability detection approach based on symbolic execution and control flow graph analysis.
Our approach employs a divide-and-conquer algorithm to eliminate irrelevant program information.
arXiv Detail & Related papers (2024-09-20T13:09:07Z) - LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions [12.706661324384319]
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature.
The adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities.
Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules.
This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++.
arXiv Detail & Related papers (2024-08-14T06:43:06Z) - Profile of Vulnerability Remediations in Dependencies Using Graph
Analysis [40.35284812745255]
This research introduces graph analysis methods and a modified Graph Attention Convolutional Neural Network (GAT) model.
We analyze control flow graphs to profile breaking changes in applications occurring from dependency upgrades intended to remediate vulnerabilities.
Results demonstrate the effectiveness of the enhanced GAT model in offering nuanced insights into the relational dynamics of code vulnerabilities.
arXiv Detail & Related papers (2024-03-08T02:01:47Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Learn then Test: Calibrating Predictive Algorithms to Achieve Risk
Control [67.52000805944924]
Learn then Test (LTT) is a framework for calibrating machine learning models.
Our main insight is to reframe the risk-control problem as multiple hypothesis testing.
We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.
arXiv Detail & Related papers (2021-10-03T17:42:03Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.