Related papers: A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities

A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities

URL: http://arxiv.org/abs/2211.08517v1
Date: Tue, 15 Nov 2022 21:21:27 GMT
Title: A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities
Authors: Arash Mahyari
Abstract summary: Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks. We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
Score: 6.09170287691728
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks. Source code static analysis has been used extensively to detect the unintentional defects, i.e. vulnerabilities, introduced into the source codes by software developers. In this paper, we propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing. The proposed approach uses a hierarchical process to first identify source codes with vulnerabilities, and then it identifies the lines of codes that contribute to the vulnerability within the detected source codes. This proposed two-step approach reduces the false alarm of detecting vulnerable lines. Our extensive experiment on real-world and synthetic codes collected in NVD and SARD shows high accuracy (about 98\%) in detecting source code vulnerabilities.

Related papers

RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
Harnessing the Power of LLMs in Source Code Vulnerability Detection [0.0]
Software vulnerabilities, caused by unintentional flaws in source code, are a primary root cause of cyberattacks. We harness Large Language Models' capabilities to analyze source code and detect known vulnerabilities.
arXiv Detail & Related papers (2024-08-07T00:48:49Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors [33.395068754566935]
VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets. It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
arXiv Detail & Related papers (2024-01-05T10:15:04Z)
Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications. The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source Control Flow [2.561778620560749]
This paper mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types. We propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities.
arXiv Detail & Related papers (2023-03-13T13:58:39Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
DCDetector: An IoT terminal vulnerability mining system based on distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++. This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model. Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.