Lyra: A Benchmark for Turducken-Style Code Generation
- URL: http://arxiv.org/abs/2108.12144v1
- Date: Fri, 27 Aug 2021 07:22:55 GMT
- Title: Lyra: A Benchmark for Turducken-Style Code Generation
- Authors: Qingyuan Liang, Zeyu Sun, Qihao Zhu, Wenjie Zhang, Lian Yu, Yingfei
Xiong, Lu Zhang
- Abstract summary: In software development, one programming language is often embedded in another.
This paper defines a new code generation task: given a natural language comment, this task aims to generate a program in a base language with an embedded language.
To our knowledge, this is the first turducken-style code generation task.
- Score: 15.810088578588028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation is crucial to reduce manual software development efforts.
Recently, neural techniques have been used to generate source code
automatically. While promising, these approaches are evaluated on tasks for
generating code in single programming languages. However, in actual
development, one programming language is often embedded in another. For
example, SQL statements are often embedded as strings in base programming
languages such as Python and Java, and JavaScript programs are often embedded
in sever-side programming languages, such as PHP, Java, and Python. We call
this a turducken-style programming. In this paper, we define a new code
generation task: given a natural language comment, this task aims to generate a
program in a base language with an embedded language. To our knowledge, this is
the first turducken-style code generation task. For this task, we present Lyra:
a dataset in Python with embedded SQL. This dataset contains 2,000 carefully
annotated database manipulation programs from real usage projects. Each program
is paired with both a Chinese comment and an English comment. In our
experiment, we adopted Transformer, a state-of-the-art technique, as the
baseline. In the best setting, Transformer achieves 0.5% and 1.5% AST exact
matching accuracy using Chinese and English comments, respectively. Therefore,
we believe that Lyra provides a new challenge for code generation.
Related papers
- Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages [1.559169421643164]
Node-based programming languages are increasingly popular in media arts coding domains.
Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity.
Best strategy for code generation for visual node-based programming languages is still an open question.
arXiv Detail & Related papers (2024-09-01T22:11:23Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - Automatic Generation of Python Programs Using Context-Free Grammars [0.1227734309612871]
TinyPy Generator is a tool that generates random Python programs using a context-free grammar.
Our system uses custom production rules to generate code with different levels of complexity.
TinyPy Generator is useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models.
arXiv Detail & Related papers (2024-03-11T08:25:52Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English.
We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian.
While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - Natural Language-guided Programming [1.3955252961896318]
We put forward a vision based on a new breed of developer tools that have the potential to largely automate this process.
Key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next.
We call this practice of enriching the code with natural language intent to facilitate its completion natural language-guided programming.
arXiv Detail & Related papers (2021-08-11T13:06:33Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.