A Survey of Deep Learning Models for Structural Code Understanding
- URL: http://arxiv.org/abs/2205.01293v1
- Date: Tue, 3 May 2022 03:56:17 GMT
- Title: A Survey of Deep Learning Models for Structural Code Understanding
- Authors: Ruoting Wu, Yuxin Zhang, Qibiao Peng, Liang Chen and Zibin Zheng
- Abstract summary: We present a comprehensive overview of the structures formed from code data.
We categorize the models for understanding code in recent years into two groups: sequence-based and graph-based models.
We also introduce metrics, datasets and the downstream tasks.
- Score: 21.66270320648155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the rise of deep learning and automation requirements in the
software industry has elevated Intelligent Software Engineering to new heights.
The number of approaches and applications in code understanding is growing,
with deep learning techniques being used in many of them to better capture the
information in code data. In this survey, we present a comprehensive overview
of the structures formed from code data. We categorize the models for
understanding code in recent years into two groups: sequence-based and
graph-based models, further make a summary and comparison of them. We also
introduce metrics, datasets and the downstream tasks. Finally, we make some
suggestions for future research in structural code understanding field.
Related papers
- Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - Semantically Aligned Question and Code Generation for Automated Insight Generation [20.795381712667034]
We leverage the semantic knowledge of large language models to generate targeted and insightful questions about data.
We show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code.
arXiv Detail & Related papers (2024-03-21T10:01:05Z) - Survey of Code Search Based on Deep Learning [11.94599964179766]
This survey focuses on code search, that is, to retrieve code that matches a given query.
Deep learning, being able to extract complex semantics information, has achieved great success in this field.
We propose a new taxonomy to illustrate the state-of-the-art deep learning-based code search.
arXiv Detail & Related papers (2023-05-10T08:07:04Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Towards Top-Down Automated Development in Limited Scopes: A
Neuro-Symbolic Framework from Expressibles to Executables [4.844958528198992]
We build a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information.
We introduce a three-layer semantic pyramid (SP) to associate text data and code data.
We propose a semantic pyramid framework (SPF) as the approach, focusing on software of high modularity and low complexity.
arXiv Detail & Related papers (2022-09-04T08:35:16Z) - Adding Context to Source Code Representations for Deep Learning [13.676416860721877]
We argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed.
We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model.
arXiv Detail & Related papers (2022-07-30T12:47:32Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic.
COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.