Code2Image: Intelligent Code Analysis by Computer Vision Techniques and
Application to Vulnerability Prediction
- URL: http://arxiv.org/abs/2105.03131v1
- Date: Fri, 7 May 2021 09:10:20 GMT
- Title: Code2Image: Intelligent Code Analysis by Computer Vision Techniques and
Application to Vulnerability Prediction
- Authors: Zeki Bilgin
- Abstract summary: We present a novel method to represent source code as image while preserving semantic and syntactic properties.
The method makes it possible to directly enter the resulting image representation of source codes into deep learning (DL) algorithms as input.
We demonstrate feasibility and effectiveness of our method by realizing a vulnerability prediction use case over a public dataset.
- Score: 0.6091702876917281
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Intelligent code analysis has received increasing attention in parallel with
the remarkable advances in the field of machine learning (ML) in recent years.
A major challenge in leveraging ML for this purpose is to represent source code
in a useful form that ML algorithms can accept as input. In this study, we
present a novel method to represent source code as image while preserving
semantic and syntactic properties, which paves the way for leveraging computer
vision techniques to use for code analysis. Indeed the method makes it possible
to directly enter the resulting image representation of source codes into deep
learning (DL) algorithms as input without requiring any further data
pre-processing or feature extraction step. We demonstrate feasibility and
effectiveness of our method by realizing a vulnerability prediction use case
over a public dataset containing a large number of real-world source code
samples with performance evaluation in comparison to the state-of-art
solutions. Our implementation is publicly available.
Related papers
- SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors [0.0]
Large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as code understanding and code generation.
However, an equally important yet underexplored question is whether LLMs can serve as general-purpose surrogate code executors.
This study provides empirical insights into the feasibility of using LLMs as surrogate code executors.
arXiv Detail & Related papers (2025-02-16T15:38:19Z) - CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps and Vision Models [28.711745671275477]
The rise of large language models (LLMs) has significantly improved automated code generation, enhancing software development efficiency.
Existing detection methods, such as pre-trained models and watermarking, face limitations in adaptability and computational efficiency.
We propose a novel detection method using 2D token probability maps combined with vision models, preserving spatial code structures.
arXiv Detail & Related papers (2025-01-06T06:15:10Z) - Case2Code: Scalable Synthetic Data for Code Generation [105.89741089673575]
Large Language Models (LLMs) have shown outstanding breakthroughs in code generation.
Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs.
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z) - Enhancing Source Code Representations for Deep Learning with Static
Analysis [10.222207222039048]
This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models.
We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns.
Our approach improves the representation and processing of source code, thereby improving task performance.
arXiv Detail & Related papers (2024-02-14T20:17:04Z) - Exploring Representation-Level Augmentation for Code Search [50.94201167562845]
We explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training.
We experimentally evaluate the proposed representation-level augmentation methods with state-of-the-art code search models on a large-scale public dataset.
arXiv Detail & Related papers (2022-10-21T22:47:37Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Multi-View Graph Representation for Programming Language Processing: An
Investigation into Algorithm Detection [35.81014952109471]
This paper proposes a multi-view graph (MVG) program representation method.
MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views.
In experiments, MVG outperforms previous methods significantly.
arXiv Detail & Related papers (2022-02-25T03:35:45Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.