Related papers: ChatGPT as a Solver and Grader of Programming Exams written in Spanish

ChatGPT as a Solver and Grader of Programming Exams written in Spanish

URL: http://arxiv.org/abs/2409.15112v1
Date: Mon, 23 Sep 2024 15:20:07 GMT
Title: ChatGPT as a Solver and Grader of Programming Exams written in Spanish
Authors: Pablo Fernández-Saborido, Marcos Fernández-Pichel, David E. Losada,
Abstract summary: We assess ChatGPT's capacities to solve and grade real programming exams. Our findings suggest that this AI model is only effective for solving simple coding tasks. Its proficiency in tackling complex problems or evaluating solutions authored by others are far from effective.
Score: 3.8984586307450093
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating the capabilities of Large Language Models (LLMs) to assist teachers and students in educational tasks is receiving increasing attention. In this paper, we assess ChatGPT's capacities to solve and grade real programming exams, from an accredited BSc degree in Computer Science, written in Spanish. Our findings suggest that this AI model is only effective for solving simple coding tasks. Its proficiency in tackling complex problems or evaluating solutions authored by others are far from effective. As part of this research, we also release a new corpus of programming tasks and the corresponding prompts for solving the problems or grading the solutions. This resource can be further exploited by other research teams.

Related papers

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories. Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development. We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z)
Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [175.9723801486487]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions. GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z)
Learning Task Decomposition to Assist Humans in Competitive Programming [90.4846613669734]
We introduce a novel objective for learning task decomposition, termed value (AssistV) We collect a dataset of human repair experiences on different decomposed solutions. Under 177 hours of human study, our method enables non-experts to solve 33.3% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.
arXiv Detail & Related papers (2024-06-07T03:27:51Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence [0.0]
The effectiveness of using large language models for solving programming tasks has been underexplored. The present study examines ChatGPT's ability to generate code solutions at different difficulty levels for introductory programming courses. Results contribute to the ongoing debate on the utility of AI-powered tools in programming education.
arXiv Detail & Related papers (2023-12-02T11:09:17Z)
Exploring the Potential of Large Language Models to Generate Formative Programming Feedback [0.5371337604556311]
We explore the potential of large language models (LLMs) for computing educators and learners. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT. Results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.
arXiv Detail & Related papers (2023-08-31T15:22:11Z)
Extending the Frontier of ChatGPT: Code Generation and Debugging [0.0]
ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains. This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity. The research reveals a commendable overall success rate of 71.875%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions.
arXiv Detail & Related papers (2023-07-17T06:06:58Z)
ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course [4.779196219827508]
ChatGPT is a large-scale, deep learning-driven natural language processing model. Our evaluation involves analyzing ChatGPT-generated solutions for 80 diverse programming exercises.
arXiv Detail & Related papers (2023-05-23T04:38:37Z)
Exploring the Use of ChatGPT as a Tool for Learning and Assessment in Undergraduate Computer Science Curriculum: Opportunities and Challenges [0.3553493344868413]
This paper addresses the prospects and obstacles associated with utilizing ChatGPT as a tool for learning and assessment in undergraduate Computer Science curriculum. Group B students were given access to ChatGPT and were encouraged to use it to help solve the programming challenges. Results show that students using ChatGPT had an advantage in terms of earned scores, however there were inconsistencies and inaccuracies in the submitted code.
arXiv Detail & Related papers (2023-04-16T21:04:52Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.