Related papers: Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

URL: http://arxiv.org/abs/2006.05405v5
Date: Thu, 13 May 2021 03:41:22 GMT
Title: Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
Authors: Shangqing Liu, Yu Chen, Xiaofei Xie, Jingkai Siow, Yang Liu
Abstract summary: We propose a novel retrieval-augmented mechanism to combine the benefits of both worlds. To mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph. Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.
Score: 23.445231228940738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph to complement the static graph representation of the source code, and design a hybrid message passing GNN for capturing both the local and global structural information. To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversified large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.

Related papers

Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [75.9865035064794]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
Code-Craft: Hierarchical Graph-Based Code Summarization for Enhanced Context Retrieval [0.0]
We present Hierarchical Code Graph Summarization (HCGS), a novel approach that constructs a multi-layered representation of a by generating structured summaries in a bottom-up fashion from a code graph. HCGS consistently outperforms traditional code-only retrieval across all metrics.
arXiv Detail & Related papers (2025-04-11T20:57:27Z)
GNN-Coder: Boosting Semantic Code Retrieval with Combined GNNs and Transformer [15.991615273248804]
We introduce GNN-Coder, a novel framework based on Graph Neural Network (GNN) to utilize Abstract Syntax Tree (AST) GNN-Coder significantly boosts retrieval performance, with a 1%-10% improvement in MRR on the CSN dataset, and a notable 20% gain in zero-shot performance on the CosQA dataset.
arXiv Detail & Related papers (2025-02-21T04:29:53Z)
Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs) We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs [5.953617559607503]
Vul-LMGNN is a unified model that combines pre-trained code language models with code property graphs. Vul-LMGNN constructs a code property graph that integrates various code attributes into a unified graph structure. To effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network.
arXiv Detail & Related papers (2024-04-23T03:48:18Z)
Enhancing Source Code Representations for Deep Learning with Static Analysis [10.222207222039048]
This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models. We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns. Our approach improves the representation and processing of source code, thereby improving task performance.
arXiv Detail & Related papers (2024-02-14T20:17:04Z)
Momentum Decoding: Open-ended Text Generation As Graph Exploration [49.812280360794894]
Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. We formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. We propose a novel decoding method -- textitmomentum decoding -- which encourages the LM to explore new nodes outside the current graph.
arXiv Detail & Related papers (2022-12-05T11:16:47Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
GraphSearchNet: Enhancing GNNs via Capturing Global Dependency for Semantic Code Search [15.687959123626003]
We design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search. Specifically, we propose to encode both source code and queries into two graphs with BiGGNN to capture the local structure information of the graphs. The experiments on both Java and Python datasets illustrate that GraphSearchNet outperforms current state-of-the-art works by a significant margin.
arXiv Detail & Related papers (2021-11-04T07:38:35Z)
deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search [15.19181807445119]
We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs. We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
arXiv Detail & Related papers (2021-03-24T06:57:44Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description. We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.