GAP-Gen: Guided Automatic Python Code Generation
- URL: http://arxiv.org/abs/2201.08810v2
- Date: Wed, 10 May 2023 01:01:43 GMT
- Title: GAP-Gen: Guided Automatic Python Code Generation
- Authors: Junchen Zhao, Yurun Song, Junlin Wang, Ian G. Harris
- Abstract summary: We propose a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints.
GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets.
Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.
- Score: 3.574838772430975
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automatic code generation from natural language descriptions can be highly
beneficial during the process of software development. In this work, we propose
GAP-Gen, a Guided Automatic Python Code Generation method based on Python
syntactic constraints and semantic constraints. We first introduce Python
syntactic constraints in the form of Syntax-Flow, which is a simplified version
of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract
Syntax Tree but maintaining crucial syntactic information of Python code. In
addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable
and function names consistently through out the code. In our work, rather than
pretraining, we focus on modifying the finetuning process which reduces
computational requirements but retains high generation performance on automatic
Python code generation task. GAP-Gen fine-tunes the transformer based language
models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet,
CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our
experiments show that GAP-Gen achieves better results on automatic Python code
generation task than previous works.
Related papers
- Automatic Generation of Python Programs Using Context-Free Grammars [0.1227734309612871]
TinyPy Generator is a tool that generates random Python programs using a context-free grammar.
Our system uses custom production rules to generate code with different levels of complexity.
TinyPy Generator is useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models.
arXiv Detail & Related papers (2024-03-11T08:25:52Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - InterCode: Standardizing and Benchmarking Interactive Coding with
Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment.
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution.
We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Automatic Code Generation using Pre-Trained Language Models [0.0]
We propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models.
We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46% over a reasonable sequence-to-sequence baseline.
arXiv Detail & Related papers (2021-02-21T07:21:26Z) - PyMT5: multi-mode translation of natural language and Python code with
transformers [7.973871379728246]
PyMT5 is a Python method text-to-text transfer transformer.
It can both predict whole methods from natural language documentation strings (docstrings) and summarize code into docstrings of any common style.
arXiv Detail & Related papers (2020-10-07T04:10:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.