TreeCaps: Tree-Based Capsule Networks for Source Code Processing
- URL: http://arxiv.org/abs/2009.09777v4
- Date: Mon, 14 Dec 2020 15:12:16 GMT
- Title: TreeCaps: Tree-Based Capsule Networks for Source Code Processing
- Authors: Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
- Abstract summary: We propose a new learning technique, named TreeCaps, by fusing together capsule networks with tree-based convolutional neural networks.
We find that TreeCaps is the most robust to withstand those semantic-preserving program transformations.
- Score: 28.61567319928316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently program learning techniques have been proposed to process source
code based on syntactical structures (e.g., Abstract Syntax Trees) and/or
semantic information (e.g., Dependency Graphs). Although graphs may be better
at capturing various viewpoints of code semantics than trees, constructing
graph inputs from code needs static code semantic analysis that may not be
accurate and introduces noise during learning. Although syntax trees are
precisely defined according to the language grammar and easier to construct and
process than graphs, previous tree-based learning techniques have not been able
to learn semantic information from trees to achieve better accuracy than
graph-based techniques. We propose a new learning technique, named TreeCaps, by
fusing together capsule networks with tree-based convolutional neural networks,
to achieve learning accuracy higher than existing graph-based techniques while
it is based only on trees. TreeCaps introduces novel variable-to-static routing
algorithms into the capsule networks to compensate for the loss of previous
routing algorithms. Aside from accuracy, we also find that TreeCaps is the most
robust to withstand those semantic-preserving program transformations that
change code syntax without modifying the semantics. Evaluated on a large number
of Java and C/C++ programs, TreeCaps models outperform prior deep learning
models of program source code, in terms of both accuracy and robustness for
program comprehension tasks such as code functionality classification and
function name prediction
Related papers
- Joint Language Semantic and Structure Embedding for Knowledge Graph
Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information.
Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models.
Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Structural Optimization Makes Graph Classification Simpler and Better [5.770986723520119]
We investigate the feasibility of improving graph classification performance while simplifying the model learning process.
Inspired by progress in structural information assessment, we optimize the given data sample from graphs to encoding trees.
We present an implementation of the scheme in a tree kernel and a convolutional network to perform graph classification.
arXiv Detail & Related papers (2021-09-05T08:54:38Z) - Recursive Tree Grammar Autoencoders [3.791857415239352]
We propose a novel autoencoder approach that encodes trees via a bottom-up grammar and decodes trees via a tree grammar.
We show experimentally that our proposed method improves the autoencoding error, training time, and optimization score on four benchmark datasets.
arXiv Detail & Related papers (2020-12-03T17:37:25Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z) - Tree Echo State Autoencoders with Grammars [3.7280152311394827]
Non-vectorial and discrete nature of trees makes it challenging to construct functions with tree-formed output.
Existing autoencoding approaches fail to take the specific grammatical structure of tree domains into account.
We propose tree echo state autoencoders (TES-AE), which are guided by a tree grammar and can be trained within seconds by virtue of reservoir computing.
arXiv Detail & Related papers (2020-04-19T18:04:33Z) - Graph-to-Tree Neural Networks for Learning Structured Input-Output
Translation with Applications to Semantic Parsing and Math Word Problem [33.610361579567794]
We present a novel Graph-to-Tree Neural Networks, namely Graph2Tree consisting of a graph encoder and a hierarchical tree decoder, that encodes an augmented graph-structured input and decodes a tree-structured output.
Our experiments demonstrate that our Graph2Tree model outperforms or matches the performance of other state-of-the-art models on these tasks.
arXiv Detail & Related papers (2020-04-07T17:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.