Related papers: deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

URL: http://arxiv.org/abs/2103.13020v1
Date: Wed, 24 Mar 2021 06:57:44 GMT
Title: deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
Authors: Chen Zeng, Yue Yu, Shanshan Li, Xin Xia, Zhiming Wang, Mingyang Geng, Bailin Xiao, Wei Dong, Xiangke Liao
Abstract summary: We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs. We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
Score: 15.19181807445119
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid increase in the amount of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language. Despite existing deep learning based approaches(e.g., DeepCS and MMAN) have provided the end-to-end solutions (i.e., accepts natural language as queries and shows related code fragments retrieved directly from code corpus), the accuracy of code search in the large-scale repositories is still limited by the code representation (e.g., AST) and modeling (e.g., directly fusing the features in the attention stage). In this paper, we propose a novel learnable deep Graph for Code Search (calleddeGraphCS), to transfer source code into variable-based flow graphs based on the intermediate representation technique, which can model code semantics more precisely compared to process the code as text directly or use the syntactic tree representation. Furthermore, we propose a well-designed graph optimization mechanism to refine the code representation, and apply an improved gated graph neural network to model variable-based flow graphs. To evaluate the effectiveness of deGraphCS, we collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language, and reproduce several typical deep code search methods for comparison. Besides, we design a qualitative user study to verify the practical value of our approach. The experimental results have shown that deGraphCS can achieve state-of-the-art performances, and accurately retrieve code snippets satisfying the needs of the users.

Related papers

Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs. Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure. To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z)
CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations. We demonstrate its effectiveness in code smell detection as an illustrative use case. ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
GN-Transformer: Fusing Sequence and Graph Representation for Improved Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality. The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z)
GraphSearchNet: Enhancing GNNs via Capturing Global Dependency for Semantic Code Search [15.687959123626003]
We design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search. Specifically, we propose to encode both source code and queries into two graphs with BiGGNN to capture the local structure information of the graphs. The experiments on both Java and Python datasets illustrate that GraphSearchNet outperforms current state-of-the-art works by a significant margin.
arXiv Detail & Related papers (2021-11-04T07:38:35Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph. It is updated by decoding in the context of an auto-encoder. Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.