xASTNN: Improved Code Representations for Industrial Practice
- URL: http://arxiv.org/abs/2303.07104v3
- Date: Mon, 6 Nov 2023 03:06:37 GMT
- Title: xASTNN: Improved Code Representations for Industrial Practice
- Authors: Zhiwei Xu, Min Zhou, Xibin Zhao, Yang Chen, Xi Cheng, Hongyu Zhang
- Abstract summary: We present xASTNN, an eXtreme Abstract Syntax Tree (AST)-based Neural Network for source code representation.
First, xASTNN is completely based on widely-used ASTs and does not require complicated data pre-processing.
Second, three closely-related designs are proposed to guarantee the effectiveness of xASTNN.
Third, a dynamic algorithm is introduced to significantly reduce the time complexity of xASTNN.
- Score: 30.45577773085939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of deep learning techniques in software engineering becomes
increasingly popular. One key problem is developing high-quality and
easy-to-use source code representations for code-related tasks. The research
community has acquired impressive results in recent years. However, due to the
deployment difficulties and performance bottlenecks, seldom these approaches
are applied to the industry. In this paper, we present xASTNN, an eXtreme
Abstract Syntax Tree (AST)-based Neural Network for source code representation,
aiming to push this technique to industrial practice. The proposed xASTNN has
three advantages. First, xASTNN is completely based on widely-used ASTs and
does not require complicated data pre-processing, making it applicable to
various programming languages and practical scenarios. Second, three
closely-related designs are proposed to guarantee the effectiveness of xASTNN,
including statement subtree sequence for code naturalness, gated recursive unit
for syntactical information, and gated recurrent unit for sequential
information. Third, a dynamic batching algorithm is introduced to significantly
reduce the time complexity of xASTNN. Two code comprehension downstream tasks,
code classification and code clone detection, are adopted for evaluation. The
results demonstrate that our xASTNN can improve the state-of-the-art while
being faster than the baselines.
Related papers
- Abstract Syntax Tree for Programming Language Understanding and
Representation: How Far Are We? [23.52632194060246]
Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering.
The abstract syntax tree (AST), a fundamental code feature, illustrates the syntactic information of the source code and has been widely used in code representation learning.
We compare the performance of models trained with code token sequence (Token for short) based code representation and AST-based code representation on three popular types of code-related tasks.
arXiv Detail & Related papers (2023-12-01T08:37:27Z) - Latent Space Representations of Neural Algorithmic Reasoners [15.920449080528536]
We perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms.
We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training.
We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor.
arXiv Detail & Related papers (2023-07-17T22:09:12Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Rate Coding or Direct Coding: Which One is Better for Accurate, Robust,
and Energy-efficient Spiking Neural Networks? [4.872468969809081]
Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes.
Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system.
We conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency.
arXiv Detail & Related papers (2022-01-31T16:18:07Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.