Improving Code Summarization with Block-wise Abstract Syntax Tree
Splitting
- URL: http://arxiv.org/abs/2103.07845v2
- Date: Thu, 18 Mar 2021 11:15:11 GMT
- Title: Improving Code Summarization with Block-wise Abstract Syntax Tree
Splitting
- Authors: Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li,
Rongxin Wu
- Abstract summary: Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries.
Existing AST based methods suffer from the difficulty of training and generate inadequate code summaries.
We present the Block-wise Abstract Syntax Tree Splitting method (BASTS), which fully utilizes the rich tree-form syntax structure in ASTs.
- Score: 15.28941592388958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic code summarization frees software developers from the heavy burden
of manual commenting and benefits software development and maintenance.
Abstract Syntax Tree (AST), which depicts the source code's syntactic
structure, has been incorporated to guide the generation of code summaries.
However, existing AST based methods suffer from the difficulty of training and
generate inadequate code summaries. In this paper, we present the Block-wise
Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes
the rich tree-form syntax structure in ASTs, for improving code summarization.
BASTS splits the code of a method based on the blocks in the dominator tree of
the Control Flow Graph, and generates a split AST for each code split. Each
split AST is then modeled by a Tree-LSTM using a pre-training strategy to
capture local non-linear syntax encoding. The learned syntax encoding is
combined with code encoding, and fed into Transformer to generate high-quality
code summaries. Comprehensive experiments on benchmarks have demonstrated that
BASTS significantly outperforms state-of-the-art approaches in terms of various
evaluation metrics. To facilitate reproducibility, our implementation is
available at https://github.com/XMUDM/BASTS.
Related papers
- SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - AST-MHSA : Code Summarization using Multi-Head Self-Attention [1.588193964339148]
We present a model, AST-MHSA, that uses multi-head attention to extract semantic information from the abstract syntax tree (AST) of the code.
The model is trained on a dataset of code and summaries, and the parameters are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
arXiv Detail & Related papers (2023-08-10T15:43:46Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source
Code Summarization [0.4061135251278187]
Source code summarization aims to generate natural language descriptions of code snippets.
We propose M2TS, a Multi-scale Multi-modal approach based on Transformer for source code Summarization.
We conduct experiments on two Java and one Python datasets, and the experimental results demonstrate that M2TS outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2022-03-18T02:54:06Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code
Summarization [14.225206904493627]
We propose AST-Transformer to efficiently encode tree-structured ASTs.
Experiments show that AST-Transformer outperforms the state-of-arts by a substantial margin.
arXiv Detail & Related papers (2021-12-02T12:57:22Z) - CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained
Model [23.947178895479464]
We propose CLSEBERT, a Constrastive Learning Framework for Syntax Enhanced Code Pre-Trained Model.
In the pre-training stage, we consider the code syntax and hierarchy contained in the Abstract Syntax Tree (AST)
We also introduce two novel pre-training objectives. One is to predict the edges between nodes in the abstract syntax tree, and the other is to predict the types of code tokens.
arXiv Detail & Related papers (2021-08-10T10:08:21Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.