Related papers: Graph Conditioned Sparse-Attention for Improved Source Code Understanding

Graph Conditioned Sparse-Attention for Improved Source Code Understanding

URL: http://arxiv.org/abs/2112.00663v2
Date: Fri, 3 Dec 2021 17:28:40 GMT
Title: Graph Conditioned Sparse-Attention for Improved Source Code Understanding
Authors: Junyan Cheng, Iordanis Fostiropoulos and Barry Boehm
Abstract summary: We propose the conditioning of a source code snippet with its graph modality by using the graph adjacency matrix as an attention mask for a sparse self-attention mechanism. Our model reaches state-of-the-art results in BLEU, METEOR, and ROUGE-L metrics for the code summarization task and near state-of-the-art accuracy in the variable misuse task.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer architectures have been successfully used in learning source code representations. The fusion between a graph representation like Abstract Syntax Tree (AST) and a source code sequence makes the use of current approaches computationally intractable for large input sequence lengths. Source code can have long-range dependencies that require larger sequence lengths to model effectively. Current approaches have a quadratic growth in computational and memory costs with respect to the sequence length. Using such models in practical scenarios is difficult. In this work, we propose the conditioning of a source code snippet with its graph modality by using the graph adjacency matrix as an attention mask for a sparse self-attention mechanism and the use of a graph diffusion mechanism to model longer-range token dependencies. Our model reaches state-of-the-art results in BLEU, METEOR, and ROUGE-L metrics for the code summarization task and near state-of-the-art accuracy in the variable misuse task. The memory use and inference time of our model have linear growth with respect to the input sequence length as compared to the quadratic growth of previous works.

Related papers

Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques [0.0]
We propose a graph computing view of attention where tokens are perceived as nodes of the graph and the attention mask determines the edges of the graph. Using this view, we develop graph processing algorithms to implement the attention mechanism. Our algorithms are able to achieve extremely long sequence lengths of as high as 160 million on a single NVIDIA A100 GPU.
arXiv Detail & Related papers (2025-01-31T22:05:00Z)
Learning Long Range Dependencies on Graphs via Random Walks [6.7864586321550595]
Message-passing graph neural networks (GNNs) excel at capturing local relationships but struggle with long-range dependencies in graphs. graph transformers (GTs) enable global information exchange but often oversimplify the graph structure by representing graphs as sets of fixed-length vectors. This work introduces a novel architecture that overcomes the shortcomings of both approaches by combining the long-range information of random walks with local message passing.
arXiv Detail & Related papers (2024-06-05T15:36:57Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces [4.928791850200171]
We introduce Graph-Mamba, the first attempt to enhance long-range context modeling in graph networks. We formulate graph-centric node prioritization and permutation strategies to enhance context-aware reasoning. Experiments on ten benchmark datasets demonstrate that Graph-Mamba outperforms state-of-the-art methods in long-range graph prediction tasks.
arXiv Detail & Related papers (2024-02-01T17:21:53Z)
DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models. We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn. Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Gransformer: Transformer-based Graph Generation [14.161975556325796]
Gransformer is an algorithm based on Transformer for generating graphs. We modify the Transformer encoder to exploit the structural information of the given graph. We also introduce a graph-based familiarity measure between node pairs.
arXiv Detail & Related papers (2022-03-25T14:05:12Z)
GN-Transformer: Fusing Sequence and Graph Representation for Improved Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality. The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z)
Auto-decoding Graphs [91.3755431537592]
The generative model is an auto-decoder that learns to synthesize graphs from latent codes. Graphs are synthesized using self-attention modules that are trained to identify likely connectivity patterns.
arXiv Detail & Related papers (2020-06-04T14:23:01Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.