Understanding Long Programming Languages with Structure-Aware Sparse
Attention
- URL: http://arxiv.org/abs/2205.13730v1
- Date: Fri, 27 May 2022 02:50:57 GMT
- Title: Understanding Long Programming Languages with Structure-Aware Sparse
Attention
- Authors: Tingting Liu, Chengyu Wang, Cen Chen, Ming Gao, Aoying Zhou
- Abstract summary: We present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks.
The key components in SASA are top-$k$ sparse attention and Abstract Syntax Tree (AST)-based structure-aware attention.
Experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.
- Score: 32.21325784213584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have
achieved great success in many downstream code-related tasks. Since the memory
and computational complexity of self-attention in the Transformer grow
quadratically with the sequence length, PPLMs typically limit the code length
to 512. However, codes in real-world applications are generally long, such as
code searches, which cannot be processed efficiently by existing PPLMs. To
solve this problem, in this paper, we present SASA, a Structure-Aware Sparse
Attention mechanism, which reduces the complexity and improves performance for
long code understanding tasks. The key components in SASA are top-$k$ sparse
attention and Abstract Syntax Tree (AST)-based structure-aware attention. With
top-$k$ sparse attention, the most crucial attention relation can be obtained
with a lower computational cost. As the code structure represents the logic of
the code statements, which is a complement to the code sequence
characteristics, we further introduce AST structures into attention. Extensive
experiments on CodeXGLUE tasks show that SASA achieves better performance than
the competing baselines.
Related papers
- Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - AST-MHSA : Code Summarization using Multi-Head Self-Attention [1.588193964339148]
We present a model, AST-MHSA, that uses multi-head attention to extract semantic information from the abstract syntax tree (AST) of the code.
The model is trained on a dataset of code and summaries, and the parameters are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
arXiv Detail & Related papers (2023-08-10T15:43:46Z) - LongCoder: A Long-Range Pre-trained Language Model for Code Completion [56.813974784131624]
LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens.
Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction.
memory tokens are included to highlight important statements that may be invoked later and need to be memorized.
arXiv Detail & Related papers (2023-06-26T17:59:24Z) - Improving Code Summarization with Block-wise Abstract Syntax Tree
Splitting [15.28941592388958]
Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries.
Existing AST based methods suffer from the difficulty of training and generate inadequate code summaries.
We present the Block-wise Abstract Syntax Tree Splitting method (BASTS), which fully utilizes the rich tree-form syntax structure in ASTs.
arXiv Detail & Related papers (2021-03-14T05:04:06Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.