Autoencoders as Tools for Program Synthesis
- URL: http://arxiv.org/abs/2108.07129v1
- Date: Mon, 16 Aug 2021 14:51:11 GMT
- Title: Autoencoders as Tools for Program Synthesis
- Authors: Sander de Bruin, Vadim Liventsev, Milan Petkovi\'c
- Abstract summary: We introduce a variational autoencoder model for program synthesis of industry-grade programming languages.
Our model incorporates the internal hierarchical structure of source codes and operates on parse trees.
- Score: 0.43012765978447565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently there have been many advances in research on language modeling of
source code. Applications range from code suggestion and completion to code
summarization. However, complete program synthesis of industry-grade
programming languages has not been researched extensively. In this work, we
introduce a variational autoencoder model for program synthesis of
industry-grade programming languages. Our model incorporates the internal
hierarchical structure of source codes and operates on parse trees. By learning
a latent representation of source code over trees, we capture more information
and achieve a higher performance than standard autoregressive autoencoder
models. Furthermore, due to the tree-structured nature of our model, the
autoregressive operations are performed on paths of trees instead of linear
sequences. Therefore, the size of the sequences that the autoregressive model
processes, scales proportionally to the width and depth of the tree instead of
the total size of the tree which mitigates the common problem of exploding and
vanishing gradients.
Related papers
- CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously.
We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process.
Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams) [5.384630221560809]
This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE)
In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network.
Experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average.
arXiv Detail & Related papers (2023-07-05T09:46:52Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Structural Optimization Makes Graph Classification Simpler and Better [5.770986723520119]
We investigate the feasibility of improving graph classification performance while simplifying the model learning process.
Inspired by progress in structural information assessment, we optimize the given data sample from graphs to encoding trees.
We present an implementation of the scheme in a tree kernel and a convolutional network to perform graph classification.
arXiv Detail & Related papers (2021-09-05T08:54:38Z) - Recursive Tree Grammar Autoencoders [3.791857415239352]
We propose a novel autoencoder approach that encodes trees via a bottom-up grammar and decodes trees via a tree grammar.
We show experimentally that our proposed method improves the autoencoding error, training time, and optimization score on four benchmark datasets.
arXiv Detail & Related papers (2020-12-03T17:37:25Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Tree Echo State Autoencoders with Grammars [3.7280152311394827]
Non-vectorial and discrete nature of trees makes it challenging to construct functions with tree-formed output.
Existing autoencoding approaches fail to take the specific grammatical structure of tree domains into account.
We propose tree echo state autoencoders (TES-AE), which are guided by a tree grammar and can be trained within seconds by virtue of reservoir computing.
arXiv Detail & Related papers (2020-04-19T18:04:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.