Code2Image: Intelligent Code Analysis by Computer Vision Techniques and
Application to Vulnerability Prediction
- URL: http://arxiv.org/abs/2105.03131v1
- Date: Fri, 7 May 2021 09:10:20 GMT
- Title: Code2Image: Intelligent Code Analysis by Computer Vision Techniques and
Application to Vulnerability Prediction
- Authors: Zeki Bilgin
- Abstract summary: We present a novel method to represent source code as image while preserving semantic and syntactic properties.
The method makes it possible to directly enter the resulting image representation of source codes into deep learning (DL) algorithms as input.
We demonstrate feasibility and effectiveness of our method by realizing a vulnerability prediction use case over a public dataset.
- Score: 0.6091702876917281
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Intelligent code analysis has received increasing attention in parallel with
the remarkable advances in the field of machine learning (ML) in recent years.
A major challenge in leveraging ML for this purpose is to represent source code
in a useful form that ML algorithms can accept as input. In this study, we
present a novel method to represent source code as image while preserving
semantic and syntactic properties, which paves the way for leveraging computer
vision techniques to use for code analysis. Indeed the method makes it possible
to directly enter the resulting image representation of source codes into deep
learning (DL) algorithms as input without requiring any further data
pre-processing or feature extraction step. We demonstrate feasibility and
effectiveness of our method by realizing a vulnerability prediction use case
over a public dataset containing a large number of real-world source code
samples with performance evaluation in comparison to the state-of-art
solutions. Our implementation is publicly available.
Related papers
- Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs.
Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure.
To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z) - Enhancing Source Code Representations for Deep Learning with Static
Analysis [10.222207222039048]
This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models.
We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns.
Our approach improves the representation and processing of source code, thereby improving task performance.
arXiv Detail & Related papers (2024-02-14T20:17:04Z) - Benchmarking and Explaining Large Language Model-based Code Generation:
A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model.
We propose a novel causal graph-based representation of the prompt and the generated code.
We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z) - Exploring Representation-Level Augmentation for Code Search [50.94201167562845]
We explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training.
We experimentally evaluate the proposed representation-level augmentation methods with state-of-the-art code search models on a large-scale public dataset.
arXiv Detail & Related papers (2022-10-21T22:47:37Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Multi-View Graph Representation for Programming Language Processing: An
Investigation into Algorithm Detection [35.81014952109471]
This paper proposes a multi-view graph (MVG) program representation method.
MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views.
In experiments, MVG outperforms previous methods significantly.
arXiv Detail & Related papers (2022-02-25T03:35:45Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.