Related papers: Code2Image: Intelligent Code Analysis by Computer Vision Techniques and Application to Vulnerability Prediction

Code2Image: Intelligent Code Analysis by Computer Vision Techniques and Application to Vulnerability Prediction

URL: http://arxiv.org/abs/2105.03131v1
Date: Fri, 7 May 2021 09:10:20 GMT
Title: Code2Image: Intelligent Code Analysis by Computer Vision Techniques and Application to Vulnerability Prediction
Authors: Zeki Bilgin
Abstract summary: We present a novel method to represent source code as image while preserving semantic and syntactic properties. The method makes it possible to directly enter the resulting image representation of source codes into deep learning (DL) algorithms as input. We demonstrate feasibility and effectiveness of our method by realizing a vulnerability prediction use case over a public dataset.
Score: 0.6091702876917281
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Intelligent code analysis has received increasing attention in parallel with the remarkable advances in the field of machine learning (ML) in recent years. A major challenge in leveraging ML for this purpose is to represent source code in a useful form that ML algorithms can accept as input. In this study, we present a novel method to represent source code as image while preserving semantic and syntactic properties, which paves the way for leveraging computer vision techniques to use for code analysis. Indeed the method makes it possible to directly enter the resulting image representation of source codes into deep learning (DL) algorithms as input without requiring any further data pre-processing or feature extraction step. We demonstrate feasibility and effectiveness of our method by realizing a vulnerability prediction use case over a public dataset containing a large number of real-world source code samples with performance evaluation in comparison to the state-of-art solutions. Our implementation is publicly available.

Related papers

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding [50.17907898478795]
This work proposes a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in real-world reverse engineering scenarios. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2025-04-30T17:02:06Z)
CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps and Vision Models [28.711745671275477]
The rise of large language models (LLMs) has significantly improved automated code generation, enhancing software development efficiency. Existing detection methods, such as pre-trained models and watermarking, face limitations in adaptability and computational efficiency. We propose a novel detection method using 2D token probability maps combined with vision models, preserving spatial code structures.
arXiv Detail & Related papers (2025-01-06T06:15:10Z)
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs. Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure. To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z)
Enhancing Source Code Representations for Deep Learning with Static Analysis [10.222207222039048]
This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models. We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns. Our approach improves the representation and processing of source code, thereby improving task performance.
arXiv Detail & Related papers (2024-02-14T20:17:04Z)
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model. We propose a novel causal graph-based representation of the prompt and the generated code. We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z)
Exploring Representation-Level Augmentation for Code Search [50.94201167562845]
We explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training. We experimentally evaluate the proposed representation-level augmentation methods with state-of-the-art code search models on a large-scale public dataset.
arXiv Detail & Related papers (2022-10-21T22:47:37Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
Multi-View Graph Representation for Programming Language Processing: An Investigation into Algorithm Detection [35.81014952109471]
This paper proposes a multi-view graph (MVG) program representation method. MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views. In experiments, MVG outperforms previous methods significantly.
arXiv Detail & Related papers (2022-02-25T03:35:45Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description. We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.