A Knowledge-Component-Based Methodology for Evaluating AI Assistants
- URL: http://arxiv.org/abs/2406.05603v1
- Date: Sun, 9 Jun 2024 00:58:39 GMT
- Title: A Knowledge-Component-Based Methodology for Evaluating AI Assistants
- Authors: Laryn Qi, J. D. Zamfirescu-Pereira, Taehan Kim, Björn Hartmann, John DeNero, Narges Norouzi,
- Abstract summary: We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4.
This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises.
- Score: 9.412070852474313
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4, a large language model. This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises. A hint can be requested each time a student fails a test case. Our evaluation addresses three Research Questions: RQ1: Do the hints help students improve their code? RQ2: How effectively do the hints capture problems in student code? RQ3: Are the issues that students resolve the same as the issues addressed in the hints? To address these research questions quantitatively, we identified a set of fine-grained knowledge components and determined which ones apply to each exercise, incorrect solution, and generated hint. Comparing data from two large CS1 offerings, we found that access to the hints helps students to address problems with their code more quickly, that hints are able to consistently capture the most pressing errors in students' code, and that hints that address a few issues at once rather than a single bug are more likely to lead to direct student progress.
Related papers
- One Step at a Time: Combining LLMs and Static Analysis to Generate Next-Step Hints for Programming Tasks [5.069252018619403]
Students often struggle with solving programming problems when learning to code, especially when they have to do it online.
This help can be provided as next-step hint generation, showing a student what specific small step they need to do next to get to the correct solution.
We propose a novel system to provide both textual and code hints for programming tasks.
arXiv Detail & Related papers (2024-10-11T21:41:57Z) - Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all.
LLMs struggle to precisely detect student's errors and tailor their feedback to these errors.
Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z) - Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging [27.70379206820154]
Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving.
TreeInstruct asks probing questions to help students independently identify and resolve errors.
It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state.
arXiv Detail & Related papers (2024-06-17T16:28:21Z) - SCREWS: A Modular Framework for Reasoning with Revisions [58.698199183147935]
We present SCREWS, a modular framework for reasoning with revisions.
We show that SCREWS unifies several previous approaches under a common framework.
We evaluate our framework with state-of-the-art LLMs on a diverse set of reasoning tasks.
arXiv Detail & Related papers (2023-09-20T15:59:54Z) - Automated Questions About Learners' Own Code Help to Detect Fragile
Knowledge [0.0]
Students are able to produce correctly functioning program code even though they have a fragile understanding of how it actually works.
Questions derived automatically from individual exercise submissions (QLC) can probe if and how well the students understand the structure and logic of the code they just created.
arXiv Detail & Related papers (2023-06-28T14:49:16Z) - Least-to-Most Prompting Enables Complex Reasoning in Large Language
Models [52.59923418570378]
We propose a novel prompting strategy, least-to-most prompting, to overcome the challenge of easy-to-hard generalization.
We show that least-to-most prompting is capable of generalizing to more difficult problems than those seen in prompts.
neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples.
arXiv Detail & Related papers (2022-05-21T15:34:53Z) - Steps Before Syntax: Helping Novice Programmers Solve Problems using the
PCDIT Framework [2.768397481213625]
Novice programmers often struggle with problem solving due to the high cognitive loads they face.
Many introductory programming courses do not explicitly teach it, assuming that problem solving skills are acquired along the way.
We present 'PCDIT', a non-linear problem solving framework that provides scaffolding to guide novice programmers through the process of transforming a problem specification into an implemented and tested solution for an imperative programming language.
arXiv Detail & Related papers (2021-09-18T10:31:15Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via
Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision.
Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.