Related papers: Autoencoders as Tools for Program Synthesis

Autoencoders as Tools for Program Synthesis

URL: http://arxiv.org/abs/2108.07129v1
Date: Mon, 16 Aug 2021 14:51:11 GMT
Title: Autoencoders as Tools for Program Synthesis
Authors: Sander de Bruin, Vadim Liventsev, Milan Petkovi\'c
Abstract summary: We introduce a variational autoencoder model for program synthesis of industry-grade programming languages. Our model incorporates the internal hierarchical structure of source codes and operates on parse trees.
Score: 0.43012765978447565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently there have been many advances in research on language modeling of source code. Applications range from code suggestion and completion to code summarization. However, complete program synthesis of industry-grade programming languages has not been researched extensively. In this work, we introduce a variational autoencoder model for program synthesis of industry-grade programming languages. Our model incorporates the internal hierarchical structure of source codes and operates on parse trees. By learning a latent representation of source code over trees, we capture more information and achieve a higher performance than standard autoregressive autoencoder models. Furthermore, due to the tree-structured nature of our model, the autoregressive operations are performed on paths of trees instead of linear sequences. Therefore, the size of the sequences that the autoregressive model processes, scales proportionally to the width and depth of the tree instead of the total size of the tree which mitigates the common problem of exploding and vanishing gradients.

Related papers

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously. We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams) [5.384630221560809]
This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE) In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network. Experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average.
arXiv Detail & Related papers (2023-07-05T09:46:52Z)
Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively. A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z)
Structural Optimization Makes Graph Classification Simpler and Better [5.770986723520119]
We investigate the feasibility of improving graph classification performance while simplifying the model learning process. Inspired by progress in structural information assessment, we optimize the given data sample from graphs to encoding trees. We present an implementation of the scheme in a tree kernel and a convolutional network to perform graph classification.
arXiv Detail & Related papers (2021-09-05T08:54:38Z)
Recursive Tree Grammar Autoencoders [3.791857415239352]
We propose a novel autoencoder approach that encodes trees via a bottom-up grammar and decodes trees via a tree grammar. We show experimentally that our proposed method improves the autoencoding error, training time, and optimization score on four benchmark datasets.
arXiv Detail & Related papers (2020-12-03T17:37:25Z)
Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages. We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves. We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
Tree Echo State Autoencoders with Grammars [3.7280152311394827]
Non-vectorial and discrete nature of trees makes it challenging to construct functions with tree-formed output. Existing autoencoding approaches fail to take the specific grammatical structure of tree domains into account. We propose tree echo state autoencoders (TES-AE), which are guided by a tree grammar and can be trained within seconds by virtue of reservoir computing.
arXiv Detail & Related papers (2020-04-19T18:04:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.