Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs
- URL: http://arxiv.org/abs/2103.09499v1
- Date: Wed, 17 Mar 2021 08:11:09 GMT
- Title: Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs
- Authors: Yanlin Wang, Hui Li
- Abstract summary: We propose a new code completion approach named CCAG, which models the flattened sequence of a partial AST as an AST graph.
CCAG uses our proposed AST Graph Attention Block to capture different dependencies in the AST graph for representation learning in code completion.
The experimental results show that CCAG has superior performance than state-of-the-art approaches and it is able to provide intelligent code completion.
- Score: 3.9596727975165438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code completion has become an essential component of integrated development
environments. Contemporary code completion methods rely on the abstract syntax
tree (AST) to generate syntactically correct code. However, they cannot fully
capture the sequential and repetitive patterns of writing code and the
structural information of the AST. To alleviate these problems, we propose a
new code completion approach named CCAG, which models the flattened sequence of
a partial AST as an AST graph. CCAG uses our proposed AST Graph Attention Block
to capture different dependencies in the AST graph for representation learning
in code completion. The sub-tasks of code completion are optimized via
multi-task learning in CCAG, and the task balance is automatically achieved
using uncertainty without the need to tune task weights. The experimental
results show that CCAG has superior performance than state-of-the-art
approaches and it is able to provide intelligent code completion.
Related papers
- Improving FIM Code Completions via Context & Curriculum Based Learning [6.779631208983878]
We develop a curriculum dataset by extracting hard-to-complete patterns from code repositories.
We generate context examples using semantic and static analysis tools.
We validate our approach through online A/B testing, demonstrating tangible improvements in Completion Acceptance Rate (CAR) and Completion Persistence (CPR)
arXiv Detail & Related papers (2024-12-21T11:30:54Z) - Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition [80.22784377150465]
Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding.
This paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER.
NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph (PGD)
arXiv Detail & Related papers (2024-07-16T04:52:39Z) - AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph [0.6445605125467574]
We introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG)
CSBASG represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph.
We evaluate the efficiency of our CSBASG representation for Alloy models in terms of it's compactness compared to ASTs.
arXiv Detail & Related papers (2024-02-29T22:41:09Z) - Abstract Syntax Tree for Programming Language Understanding and
Representation: How Far Are We? [23.52632194060246]
Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering.
The abstract syntax tree (AST), a fundamental code feature, illustrates the syntactic information of the source code and has been widely used in code representation learning.
We compare the performance of models trained with code token sequence (Token for short) based code representation and AST-based code representation on three popular types of code-related tasks.
arXiv Detail & Related papers (2023-12-01T08:37:27Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained
Model [23.947178895479464]
We propose CLSEBERT, a Constrastive Learning Framework for Syntax Enhanced Code Pre-Trained Model.
In the pre-training stage, we consider the code syntax and hierarchy contained in the Abstract Syntax Tree (AST)
We also introduce two novel pre-training objectives. One is to predict the edges between nodes in the abstract syntax tree, and the other is to predict the types of code tokens.
arXiv Detail & Related papers (2021-08-10T10:08:21Z) - Improving Code Summarization with Block-wise Abstract Syntax Tree
Splitting [15.28941592388958]
Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries.
Existing AST based methods suffer from the difficulty of training and generate inadequate code summaries.
We present the Block-wise Abstract Syntax Tree Splitting method (BASTS), which fully utilizes the rich tree-form syntax structure in ASTs.
arXiv Detail & Related papers (2021-03-14T05:04:06Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.