Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
- URL: http://arxiv.org/abs/2006.05405v5
- Date: Thu, 13 May 2021 03:41:22 GMT
- Title: Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
- Authors: Shangqing Liu, Yu Chen, Xiaofei Xie, Jingkai Siow, Yang Liu
- Abstract summary: We propose a novel retrieval-augmented mechanism to combine the benefits of both worlds.
To mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph.
Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.
- Score: 23.445231228940738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source code summarization aims to generate natural language summaries from
structured code snippets for better understanding code functionalities.
However, automatic code summarization is challenging due to the complexity of
the source code and the language gap between the source code and natural
language summaries. Most previous approaches either rely on retrieval-based
(which can take advantage of similar examples seen from the retrieval database,
but have low generalization performance) or generation-based methods (which
have better generalization performance, but cannot take advantage of similar
examples). This paper proposes a novel retrieval-augmented mechanism to combine
the benefits of both worlds. Furthermore, to mitigate the limitation of Graph
Neural Networks (GNNs) on capturing global graph structure information of
source code, we propose a novel attention-based dynamic graph to complement the
static graph representation of the source code, and design a hybrid message
passing GNN for capturing both the local and global structural information. To
evaluate the proposed approach, we release a new challenging benchmark, crawled
from diversified large-scale open-source C projects (total 95k+ unique
functions in the dataset). Our method achieves the state-of-the-art
performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of
BLEU-4, ROUGE-L and METEOR.
Related papers
- Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs.
Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure.
To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z) - Enhancing Source Code Representations for Deep Learning with Static
Analysis [10.222207222039048]
This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models.
We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns.
Our approach improves the representation and processing of source code, thereby improving task performance.
arXiv Detail & Related papers (2024-02-14T20:17:04Z) - Momentum Decoding: Open-ended Text Generation As Graph Exploration [49.812280360794894]
Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing.
We formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph.
We propose a novel decoding method -- textitmomentum decoding -- which encourages the LM to explore new nodes outside the current graph.
arXiv Detail & Related papers (2022-12-05T11:16:47Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - GraphSearchNet: Enhancing GNNs via Capturing Global Dependency for
Semantic Code Search [15.687959123626003]
We design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search.
Specifically, we propose to encode both source code and queries into two graphs with BiGGNN to capture the local structure information of the graphs.
The experiments on both Java and Python datasets illustrate that GraphSearchNet outperforms current state-of-the-art works by a significant margin.
arXiv Detail & Related papers (2021-11-04T07:38:35Z) - deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search [15.19181807445119]
We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs.
We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
arXiv Detail & Related papers (2021-03-24T06:57:44Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.