Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions
- URL: http://arxiv.org/abs/2404.11734v1
- Date: Wed, 17 Apr 2024 20:37:00 GMT
- Title: Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions
- Authors: Teemu Lehtinen, Charles Koutcheme, Arto Hellas,
- Abstract summary: We explore the capability of the state-of-the-art LLMs in answering QLCs that are generated from code that the LLMs have created.
Our results show that although the state-of-the-art LLMs can create programs and trace program execution when prompted, they easily succumb to similar errors that have previously been recorded for novice programmers.
- Score: 2.377308748205625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research has explored the creation of questions from code submitted by students. These Questions about Learners' Code (QLCs) are created through program analysis, exploring execution paths, and then creating code comprehension questions from these paths and the broader code structure. Responding to the questions requires reading and tracing the code, which is known to support students' learning. At the same time, computing education researchers have witnessed the emergence of Large Language Models (LLMs) that have taken the community by storm. Researchers have demonstrated the applicability of these models especially in the introductory programming context, outlining their performance in solving introductory programming problems and their utility in creating new learning resources. In this work, we explore the capability of the state-of-the-art LLMs (GPT-3.5 and GPT-4) in answering QLCs that are generated from code that the LLMs have created. Our results show that although the state-of-the-art LLMs can create programs and trace program execution when prompted, they easily succumb to similar errors that have previously been recorded for novice programmers. These results demonstrate the fallibility of these models and perhaps dampen the expectations fueled by the recent LLM hype. At the same time, we also highlight future research possibilities such as using LLMs to mimic students as their behavior can indeed be similar for some specific tasks.
Related papers
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - CSEPrompts: A Benchmark of Introductory Computer Science Prompts [11.665831944836118]
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs)
Commercial applications have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes.
Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse.
arXiv Detail & Related papers (2024-04-03T07:55:57Z) - An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project [2.7325338323814328]
The purpose of this study is to explore computing students' experiences and approaches to using LLMs during a semester-long software engineering project.
We collected data from a senior-level software engineering course at Purdue University.
We analyzed the data to identify themes related to students' usage patterns and learning outcomes.
arXiv Detail & Related papers (2024-03-27T15:21:58Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model.
We employ a proxy model which has far fewer parameters, and take its answers as answers.
Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z) - Interactions with Prompt Problems: A New Way to Teach Programming with
Large Language Models [4.1599514827277355]
We propose a new way to teach programming with Prompt Problems.
Students receive a problem visually, indicating how input should be transformed to output, and must translate that to a prompt for an LLM to decipher.
The problem is considered correct when the code that is generated by the student prompt can pass all test cases.
arXiv Detail & Related papers (2024-01-19T15:32:46Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Next-Step Hint Generation for Introductory Programming Using Large
Language Models [0.8002196839441036]
Large Language Models possess skills such as answering questions, writing essays or solving programming exercises.
This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints.
arXiv Detail & Related papers (2023-12-03T17:51:07Z) - An In-Context Schema Understanding Method for Knowledge Base Question
Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task.
Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details.
We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - Automatically Generating CS Learning Materials with Large Language
Models [4.526618922750769]
Large Language Models (LLMs) enable software developers to generate code based on a natural language prompt.
LLMs may enable students to interact with code in new ways while helping instructors scale their learning materials.
LLMs also introduce new implications for academic integrity, curriculum design, and software engineering careers.
arXiv Detail & Related papers (2022-12-09T20:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.