Automatic Generation of Python Programs Using Context-Free Grammars
- URL: http://arxiv.org/abs/2403.06503v1
- Date: Mon, 11 Mar 2024 08:25:52 GMT
- Title: Automatic Generation of Python Programs Using Context-Free Grammars
- Authors: Kamel Yamani, Marwa Na\"ir, Riyadh Baghdadi
- Abstract summary: TinyPy Generator is a tool that generates random Python programs using a context-free grammar.
Our system uses custom production rules to generate code with different levels of complexity.
TinyPy Generator is useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models.
- Score: 0.1227734309612871
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, data has emerged as the new gold, serving as a powerful tool
for creating intelligent systems. However, procuring high-quality data remains
challenging, especially for code. To address this, we developed TinyPy
Generator, a tool that generates random Python programs using a context-free
grammar. The generated programs are guaranteed to be correct by construction.
Our system uses custom production rules (in the Backus-Naur Form (BNF) format)
to recursively generate code. This allows us to generate code with different
levels of complexity, ranging from code containing only assignments to more
complex code containing conditionals and loops. Our proposed tool enables
effortless large-scale Python code generation, beneficial for a wide range of
applications. TinyPy Generator is particularly useful in the field of machine
learning, where it can generate substantial amounts of Python code for training
Python language models. Additionally, researchers who are studying programming
languages can utilize this tool to create datasets for their experiments, which
can help validate the robustness of code interpreters or compilers. Unlike
existing research, we have open-sourced our implementation. This allows
customization according to user needs and extends potential usage to other
languages.
Related papers
- A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning [42.350737545269105]
We show how to run Python's scikit-learn, pytorch and OpenAI gym libraries for building Machine Learning, Deep Learning, and Reinforcement Learning projects easily.
arXiv Detail & Related papers (2024-07-19T23:01:48Z) - CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation [60.799992690487336]
We propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks.
CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - COMEX: A Tool for Generating Customized Source Code Representations [7.151800146054561]
COMEX is a framework that allows researchers and developers to create and combine multiple code-views.
It can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural snippets.
It is built on tree-sitter - a widely used incremental analysis tool that supports over 40 languages.
arXiv Detail & Related papers (2023-07-10T16:46:34Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X.
These datasets cover over 10 programming languages.
We are able to assess the performance of code generation models in a multi-lingual fashion.
arXiv Detail & Related papers (2022-10-26T17:17:06Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - GAP-Gen: Guided Automatic Python Code Generation [3.574838772430975]
We propose a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints.
GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets.
Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works.
arXiv Detail & Related papers (2022-01-19T06:32:47Z) - Lyra: A Benchmark for Turducken-Style Code Generation [15.810088578588028]
In software development, one programming language is often embedded in another.
This paper defines a new code generation task: given a natural language comment, this task aims to generate a program in a base language with an embedded language.
To our knowledge, this is the first turducken-style code generation task.
arXiv Detail & Related papers (2021-08-27T07:22:55Z) - Natural Language-guided Programming [1.3955252961896318]
We put forward a vision based on a new breed of developer tools that have the potential to largely automate this process.
Key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next.
We call this practice of enriching the code with natural language intent to facilitate its completion natural language-guided programming.
arXiv Detail & Related papers (2021-08-11T13:06:33Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.