SIT3: Code Summarization with Structure-Induced Transformer
- URL: http://arxiv.org/abs/2012.14710v1
- Date: Tue, 29 Dec 2020 11:37:43 GMT
- Title: SIT3: Code Summarization with Structure-Induced Transformer
- Authors: Hongqiu Wu and Hai Zhao and Min Zhang
- Abstract summary: We propose a novel model based on structure-induced self-attention, which encodes sequential inputs with highly-effective structure modeling.
Our newly-proposed model achieves new state-of-the-art results on popular benchmarks.
- Score: 48.000063280183376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code summarization (CS) is becoming a promising area in recent natural
language understanding, which aims to generate sensible annotations
automatically for source code and is known as programmer oriented. Previous
works attempt to apply structure-based traversal (SBT) or non-sequential models
like Tree-LSTM and GNN to learn structural program semantics. They both meet
the following drawbacks: 1) it is shown ineffective to incorporate SBT into
Transformer; 2) it is limited to capture global information through GNN; 3) it
is underestimated to capture structural semantics only using Transformer. In
this paper, we propose a novel model based on structure-induced self-attention,
which encodes sequential inputs with highly-effective structure modeling.
Extensive experiments show that our newly-proposed model achieves new
state-of-the-art results on popular benchmarks. To our best knowledge, it is
the first work on code summarization that uses Transformer to model structural
information with high efficiency and no extra parameters. We also provide a
tutorial on how we pre-process.
Related papers
- Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing [19.73918716354272]
We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
It achieves SOTA performance with up to 30$times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
arXiv Detail & Related papers (2024-05-23T20:50:50Z) - Pushdown Layers: Encoding Recursive Structure in Transformer Language
Models [86.75729087623259]
Recursion is a prominent feature of human language, and fundamentally challenging for self-attention.
This work introduces Pushdown Layers, a new self-attention layer.
Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization.
arXiv Detail & Related papers (2023-10-29T17:27:18Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - Source Code Summarization with Structural Relative Position Guided
Transformer [19.828300746504148]
Source code summarization aims at generating concise and clear natural language descriptions for programming languages.
Recent efforts focus on incorporating the syntax structure of code into neural networks such as Transformer.
We propose a Structural Relative Position guided Transformer, named SCRIPT.
arXiv Detail & Related papers (2022-02-14T07:34:33Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Improving Transformer-Kernel Ranking Model Using Conformer and Query
Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark.
A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences.
In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.