Automatic Generation of Programming Exercises and Code Explanations with
Large Language Models
- URL: http://arxiv.org/abs/2206.11861v1
- Date: Fri, 3 Jun 2022 11:00:43 GMT
- Title: Automatic Generation of Programming Exercises and Code Explanations with
Large Language Models
- Authors: Sami Sarsa, Paul Denny, Arto Hellas, Juho Leinonen
- Abstract summary: OpenAI Codex is a recent large language model from the GPT-3 family for translating code into natural language.
We explore the natural language generation capabilities of Codex in two different phases of the life of a programming exercise.
We find the majority of this automatically generated content both novel and sensible, and in many cases ready to use as is.
- Score: 4.947560475228859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: OpenAI Codex is a recent large language model from the GPT-3 family for
translating code into natural language and vice versa. Recent explorations of
Codex have highlighted that given typical introductory programming exercise
problem statements as input, the model can generate code solutions well above
the level of an average student. In this article, we explore the natural
language generation capabilities of Codex in two different phases of the life
of a programming exercise; automatically creating programming exercises
(including sample solutions and test cases) and explanations of written code,
assessing these qualitatively and quantitatively. We find the majority of this
automatically generated content both novel and sensible, and in many cases
ready to use as is. We further find that influencing the content of the created
programming exercises is remarkably easy with minor modifications to the input.
Our analysis suggests that there is significant value in massive generative
machine learning models as a tool for instructors, although some oversight
might be needed to ensure the quality of the generated content before it is
delivered to students. We further discuss the implications of OpenAI Codex and
similar tools for introductory programming education and highlight future
research streams that have the potential to improve the quality of the
educational experience for both teachers and students alike.
Related papers
- Large Language Models in Computer Science Education: A Systematic Literature Review [7.240148550817106]
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP)
Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-10-21T17:49:50Z) - Curriculum Learning for Small Code Language Models [0.09999629695552192]
This paper explores the potential of curriculum learning in enhancing the performance of code language models.
We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models.
arXiv Detail & Related papers (2024-07-14T13:32:24Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - A Comparative Study of Code Generation using ChatGPT 3.5 across 10
Programming Languages [0.0]
Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training.
This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022.
The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains.
arXiv Detail & Related papers (2023-08-08T15:02:32Z) - Computing Education in the Era of Generative AI [6.058132379003054]
Recent advances in artificial intelligence have resulted in code generation models that can produce source code from natural language problem descriptions.
We discuss the challenges and opportunities such models present to computing educators.
We consider likely impacts of such models upon pedagogical practice in the context of the most recent advances at the time of writing.
arXiv Detail & Related papers (2023-06-05T05:43:35Z) - Enhancing Automated Program Repair through Fine-tuning and Prompt
Engineering [2.3826139428423576]
Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset.
Some recent studies demonstrated strong empirical evidence that code review could improve the program repair further.
We investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair.
arXiv Detail & Related papers (2023-04-16T17:29:51Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English.
We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian.
While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.