Embedding API Dependency Graph for Neural Code Generation
- URL: http://arxiv.org/abs/2103.15361v1
- Date: Mon, 29 Mar 2021 06:26:38 GMT
- Title: Embedding API Dependency Graph for Neural Code Generation
- Authors: Chen Lyu, Ruyun Wang, Hongyu Zhang, Hanwen Zhang, Songlin Hu
- Abstract summary: We propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence model.
In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code.
Our proposed approach, called ADG-Seq2Seq, yields significant improvements over existing state-of-the-art methods.
- Score: 14.246659920310003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of code generation from textual program descriptions has long
been viewed as a grand challenge in software engineering. In recent years, many
deep learning based approaches have been proposed, which can generate a
sequence of code from a sequence of textual program description. However, the
existing approaches ignore the global relationships among API methods, which
are important for understanding the usage of APIs. In this paper, we propose to
model the dependencies among API methods as an API dependency graph (ADG) and
incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In
addition to the existing encoder-decoder structure, a new module named
``embedder" is introduced. In this way, the decoder can utilize both global
structural dependencies and textual program description to predict the target
code. We conduct extensive code generation experiments on three public datasets
and in two programming languages (Python and Java). Our proposed approach,
called ADG-Seq2Seq, yields significant improvements over existing
state-of-the-art methods and maintains its performance as the length of the
target code increases. Extensive ablation tests show that the proposed ADG
embedding is effective and outperforms the baselines.
Related papers
- RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models [14.665460257371164]
Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation.
We propose AutoAPIEval, a framework designed to evaluate the capabilities of LLMs in API-oriented code generation.
arXiv Detail & Related papers (2024-09-23T17:22:09Z) - Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning [14.351476383642016]
We propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets.
Code2API does not require additional model training or any manual crafting rules.
It can be easily deployed on personal computers without relying on other external tools.
arXiv Detail & Related papers (2024-05-06T14:22:17Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - APIContext2Com: Code Comment Generation by Incorporating Pre-Defined API
Documentation [0.0]
We introduce a seq-2-seq encoder-decoder neural network model with different sets of multiple encoders to transform distinct inputs into target comments.
A ranking mechanism is also developed to exclude non-informative APIs, so that we can evaluate our approach using the Java dataset from CodeSearchNet.
arXiv Detail & Related papers (2023-03-03T00:38:01Z) - On the Effectiveness of Pretrained Models for API Learning [8.788509467038743]
Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc.
Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner.
Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
arXiv Detail & Related papers (2022-04-05T20:33:24Z) - GraphSearchNet: Enhancing GNNs via Capturing Global Dependency for
Semantic Code Search [15.687959123626003]
We design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search.
Specifically, we propose to encode both source code and queries into two graphs with BiGGNN to capture the local structure information of the graphs.
The experiments on both Java and Python datasets illustrate that GraphSearchNet outperforms current state-of-the-art works by a significant margin.
arXiv Detail & Related papers (2021-11-04T07:38:35Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.