Related papers: Automatic Generation of Programming Exercises and Code Explanations with Large Language Models

Automatic Generation of Programming Exercises and Code Explanations with Large Language Models

URL: http://arxiv.org/abs/2206.11861v1
Date: Fri, 3 Jun 2022 11:00:43 GMT
Title: Automatic Generation of Programming Exercises and Code Explanations with Large Language Models
Authors: Sami Sarsa, Paul Denny, Arto Hellas, Juho Leinonen
Abstract summary: OpenAI Codex is a recent large language model from the GPT-3 family for translating code into natural language. We explore the natural language generation capabilities of Codex in two different phases of the life of a programming exercise. We find the majority of this automatically generated content both novel and sensible, and in many cases ready to use as is.
Score: 4.947560475228859
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: OpenAI Codex is a recent large language model from the GPT-3 family for translating code into natural language and vice versa. Recent explorations of Codex have highlighted that given typical introductory programming exercise problem statements as input, the model can generate code solutions well above the level of an average student. In this article, we explore the natural language generation capabilities of Codex in two different phases of the life of a programming exercise; automatically creating programming exercises (including sample solutions and test cases) and explanations of written code, assessing these qualitatively and quantitatively. We find the majority of this automatically generated content both novel and sensible, and in many cases ready to use as is. We further find that influencing the content of the created programming exercises is remarkably easy with minor modifications to the input. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although some oversight might be needed to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
Teaching Programming in the Age of Generative AI: Insights from Literature, Pedagogical Proposals, and Student Perspectives [0.0]
This article aims to review the most relevant studies on how programming content should be taught, learned, and assessed.<n>It proposes enriching teaching and learning methodologies by focusing on code comprehension and execution.<n>It advocates for the use of visual representations of code and visual simulations of its execution as effective tools for teaching, learning, and assessing programming.
arXiv Detail & Related papers (2025-06-30T17:38:27Z)
Large Language Models in Computer Science Education: A Systematic Literature Review [7.240148550817106]
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP) Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-10-21T17:49:50Z)
Curriculum Learning for Small Code Language Models [0.09999629695552192]
This paper explores the potential of curriculum learning in enhancing the performance of code language models. We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models.
arXiv Detail & Related papers (2024-07-14T13:32:24Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages [0.0]
Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training. This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022. The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains.
arXiv Detail & Related papers (2023-08-08T15:02:32Z)
Computing Education in the Era of Generative AI [6.058132379003054]
Recent advances in artificial intelligence have resulted in code generation models that can produce source code from natural language problem descriptions. We discuss the challenges and opportunities such models present to computing educators. We consider likely impacts of such models upon pedagogical practice in the context of the most recent advances at the time of writing.
arXiv Detail & Related papers (2023-06-05T05:43:35Z)
Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering [2.3826139428423576]
Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies demonstrated strong empirical evidence that code review could improve the program repair further. We investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair.
arXiv Detail & Related papers (2023-04-16T17:29:51Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English. We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
Incorporating External Knowledge through Pre-training for Natural Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents. We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.