Related papers: GAP-Gen: Guided Automatic Python Code Generation

GAP-Gen: Guided Automatic Python Code Generation

URL: http://arxiv.org/abs/2201.08810v2
Date: Wed, 10 May 2023 01:01:43 GMT
Title: GAP-Gen: Guided Automatic Python Code Generation
Authors: Junchen Zhao, Yurun Song, Junlin Wang, Ian G. Harris
Abstract summary: We propose a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.
Score: 3.574838772430975
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.

Related papers

Effective LLM-Driven Code Generation with Pythoness [0.0]
Pythoness is an embedded domain-specific language for code generation using large language models (LLMs) In Pythoness, developers operate at the level of behavioral specifications when writing functions, classes, or an entire program. We show that Pythoness can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.
arXiv Detail & Related papers (2025-01-03T23:14:46Z)
Automatic Generation of Python Programs Using Context-Free Grammars [0.1227734309612871]
TinyPy Generator is a tool that generates random Python programs using a context-free grammar. Our system uses custom production rules to generate code with different levels of complexity. TinyPy Generator is useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models.
arXiv Detail & Related papers (2024-03-11T08:25:52Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment. Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution. We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z)
Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively. A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program. In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language. In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
Automatic Code Generation using Pre-Trained Language Models [0.0]
We propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models. We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46% over a reasonable sequence-to-sequence baseline.
arXiv Detail & Related papers (2021-02-21T07:21:26Z)
PyMT5: multi-mode translation of natural language and Python code with transformers [7.973871379728246]
PyMT5 is a Python method text-to-text transfer transformer. It can both predict whole methods from natural language documentation strings (docstrings) and summarize code into docstrings of any common style.
arXiv Detail & Related papers (2020-10-07T04:10:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.