deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
- URL: http://arxiv.org/abs/2103.13020v1
- Date: Wed, 24 Mar 2021 06:57:44 GMT
- Title: deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
- Authors: Chen Zeng, Yue Yu, Shanshan Li, Xin Xia, Zhiming Wang, Mingyang Geng,
Bailin Xiao, Wei Dong, Xiangke Liao
- Abstract summary: We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs.
We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
- Score: 15.19181807445119
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid increase in the amount of public code repositories, developers
maintain a great desire to retrieve precise code snippets by using natural
language. Despite existing deep learning based approaches(e.g., DeepCS and
MMAN) have provided the end-to-end solutions (i.e., accepts natural language as
queries and shows related code fragments retrieved directly from code corpus),
the accuracy of code search in the large-scale repositories is still limited by
the code representation (e.g., AST) and modeling (e.g., directly fusing the
features in the attention stage). In this paper, we propose a novel learnable
deep Graph for Code Search (calleddeGraphCS), to transfer source code into
variable-based flow graphs based on the intermediate representation technique,
which can model code semantics more precisely compared to process the code as
text directly or use the syntactic tree representation. Furthermore, we propose
a well-designed graph optimization mechanism to refine the code representation,
and apply an improved gated graph neural network to model variable-based flow
graphs. To evaluate the effectiveness of deGraphCS, we collect a large-scale
dataset from GitHub containing 41,152 code snippets written in C language, and
reproduce several typical deep code search methods for comparison. Besides, we
design a qualitative user study to verify the practical value of our approach.
The experimental results have shown that deGraphCS can achieve state-of-the-art
performances, and accurately retrieve code snippets satisfying the needs of the
users.
Related papers
- Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs.
Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure.
To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z) - CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations.
We demonstrate its effectiveness in code smell detection as an illustrative use case.
ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.