Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation
- URL: http://arxiv.org/abs/2305.00909v4
- Date: Wed, 19 Jul 2023 02:41:58 GMT
- Title: Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation
- Authors: Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, Kevin Wang, Yihan Xi,
Dejia Xu, Zhangyang Wang
- Abstract summary: ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
- Score: 61.50286000143233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For a complicated algorithm, its implementation by a human programmer usually
starts with outlining a rough control flow followed by iterative enrichments,
eventually yielding carefully generated syntactic structures and variables in a
hierarchy. However, state-of-the-art large language models generate codes in a
single pass, without intermediate warm-ups to reflect the structured thought
process of "outline-then-detail". Inspired by the recent success of
chain-of-thought prompting, we propose ChainCoder, a program synthesis language
model that generates Python code progressively, i.e. from coarse to fine in
multiple passes. We first decompose source code into layout frame components
and accessory components via abstract syntax tree parsing to construct a
hierarchical representation. We then reform our prediction target into a
multi-pass objective, each pass generates a subsequence, which is concatenated
in the hierarchy. Finally, a tailored transformer architecture is leveraged to
jointly encode the natural language descriptions and syntactically aligned I/O
data samples. Extensive evaluations show that ChainCoder outperforms
state-of-the-arts, demonstrating that our progressive generation eases the
reasoning procedure and guides the language model to generate higher-quality
solutions. Our codes are available at:
https://github.com/VITA-Group/ChainCoder.
Related papers
- SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Contrastive Learning for Source Code with Structural and Functional
Properties [66.10710134948478]
We present BOOST, a novel self-supervised model to focus pre-training based on the characteristics of source code.
We employ automated, structure-guided code transformation algorithms that generate functionally equivalent code that looks drastically different from the original one.
We train our model in a way that brings the functionally equivalent code closer and distinct code further through a contrastive learning objective.
arXiv Detail & Related papers (2021-10-08T02:56:43Z) - Autoencoders as Tools for Program Synthesis [0.43012765978447565]
We introduce a variational autoencoder model for program synthesis of industry-grade programming languages.
Our model incorporates the internal hierarchical structure of source codes and operates on parse trees.
arXiv Detail & Related papers (2021-08-16T14:51:11Z) - Improving Code Summarization with Block-wise Abstract Syntax Tree
Splitting [15.28941592388958]
Abstract Syntax Tree (AST), which depicts the source code's syntactic structure, has been incorporated to guide the generation of code summaries.
Existing AST based methods suffer from the difficulty of training and generate inadequate code summaries.
We present the Block-wise Abstract Syntax Tree Splitting method (BASTS), which fully utilizes the rich tree-form syntax structure in ASTs.
arXiv Detail & Related papers (2021-03-14T05:04:06Z) - Hierarchical Poset Decoding for Compositional Generalization in Language [52.13611501363484]
We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset)
Current encoder-decoder architectures do not take the poset structure of semantics into account properly.
We propose a novel hierarchical poset decoding paradigm for compositional generalization in language.
arXiv Detail & Related papers (2020-10-15T14:34:26Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.