AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code
Summarization
- URL: http://arxiv.org/abs/2112.01184v1
- Date: Thu, 2 Dec 2021 12:57:22 GMT
- Title: AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code
Summarization
- Authors: Ze Tang, Chuanyi Li, Jidong Ge, Xiaoyu Shen, Zheling Zhu and Bin Luo
- Abstract summary: We propose AST-Transformer to efficiently encode tree-structured ASTs.
Experiments show that AST-Transformer outperforms the state-of-arts by a substantial margin.
- Score: 14.225206904493627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code summarization aims to generate brief natural language descriptions for
source code. As source code is highly structured and follows strict programming
language grammars, its Abstract Syntax Tree (AST) is often leveraged to inform
the encoder about the structural information. However, ASTs are usually much
longer than the source code. Current approaches ignore the size limit and
simply feed the whole linearized AST into the encoder. To address this problem,
we propose AST-Transformer to efficiently encode tree-structured ASTs.
Experiments show that AST-Transformer outperforms the state-of-arts by a
substantial margin while being able to reduce $90\sim95\%$ of the computational
complexity in the encoding process.
Related papers
- Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - AST-T5: Structure-Aware Pretraining for Code Generation and Understanding [12.929578932351298]
Large language models (LLMs) have made significant advancements in code-related tasks, yet many LLMs treat code as simple sequences.
We introduce AST-T5, a novel pretraining paradigm that leverages the Abstract Syntax Tree (AST) for enhanced code generation, transpilation, and understanding.
arXiv Detail & Related papers (2024-01-05T06:51:08Z) - AST-MHSA : Code Summarization using Multi-Head Self-Attention [1.588193964339148]
We present a model, AST-MHSA, that uses multi-head attention to extract semantic information from the abstract syntax tree (AST) of the code.
The model is trained on a dataset of code and summaries, and the parameters are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
arXiv Detail & Related papers (2023-08-10T15:43:46Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Understanding Long Programming Languages with Structure-Aware Sparse
Attention [32.21325784213584]
We present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks.
The key components in SASA are top-$k$ sparse attention and Abstract Syntax Tree (AST)-based structure-aware attention.
Experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.
arXiv Detail & Related papers (2022-05-27T02:50:57Z) - M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source
Code Summarization [0.4061135251278187]
Source code summarization aims to generate natural language descriptions of code snippets.
We propose M2TS, a Multi-scale Multi-modal approach based on Transformer for source code Summarization.
We conduct experiments on two Java and one Python datasets, and the experimental results demonstrate that M2TS outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2022-03-18T02:54:06Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Improving Code Summarization with Block-wise Abstract Syntax Tree
Splitting [15.28941592388958]
Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries.
Existing AST based methods suffer from the difficulty of training and generate inadequate code summaries.
We present the Block-wise Abstract Syntax Tree Splitting method (BASTS), which fully utilizes the rich tree-form syntax structure in ASTs.
arXiv Detail & Related papers (2021-03-14T05:04:06Z) - Glushkov's construction for functional subsequential transducers [91.3755431537592]
Glushkov's construction has many interesting properties and they become even more evident when applied to transducers.
Special flavour of regular expressions is introduced, which can be efficiently converted to $epsilon$-free functional subsequential weighted finite state transducers.
arXiv Detail & Related papers (2020-08-05T17:09:58Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.