Assessing Student Errors in Experimentation Using Artificial
Intelligence and Large Language Models: A Comparative Study with Human Raters
- URL: http://arxiv.org/abs/2308.06088v1
- Date: Fri, 11 Aug 2023 12:03:12 GMT
- Title: Assessing Student Errors in Experimentation Using Artificial
Intelligence and Large Language Models: A Comparative Study with Human Raters
- Authors: Arne Bewersdorff, Kathrin Se{\ss}ler, Armin Baur, Enkelejda Kasneci,
Claudia Nerdel
- Abstract summary: We investigate the potential of Large Language Models (LLMs) for automatically identifying student errors.
An AI system based on the GPT-3.5 and GPT-4 series was developed and tested against human raters.
Our results indicate varying levels of accuracy in error detection between the AI system and human raters.
- Score: 9.899633398596672
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Identifying logical errors in complex, incomplete or even contradictory and
overall heterogeneous data like students' experimentation protocols is
challenging. Recognizing the limitations of current evaluation methods, we
investigate the potential of Large Language Models (LLMs) for automatically
identifying student errors and streamlining teacher assessments. Our aim is to
provide a foundation for productive, personalized feedback. Using a dataset of
65 student protocols, an Artificial Intelligence (AI) system based on the
GPT-3.5 and GPT-4 series was developed and tested against human raters. Our
results indicate varying levels of accuracy in error detection between the AI
system and human raters. The AI system can accurately identify many fundamental
student errors, for instance, the AI system identifies when a student is
focusing the hypothesis not on the dependent variable but solely on an expected
observation (acc. = 0.90), when a student modifies the trials in an ongoing
investigation (acc. = 1), and whether a student is conducting valid test trials
(acc. = 0.82) reliably. The identification of other, usually more complex
errors, like whether a student conducts a valid control trial (acc. = .60),
poses a greater challenge. This research explores not only the utility of AI in
educational settings, but also contributes to the understanding of the
capabilities of LLMs in error detection in inquiry-based learning like
experimentation.
Related papers
- Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all.
LLMs struggle to precisely detect student's errors and tailor their feedback to these errors.
Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z) - Beyond human subjectivity and error: a novel AI grading system [67.410870290301]
The grading of open-ended questions is a high-effort, high-impact task in education.
Recent breakthroughs in AI technology might facilitate such automation, but this has not been demonstrated at scale.
We introduce a novel automatic short answer grading (ASAG) system.
arXiv Detail & Related papers (2024-05-07T13:49:59Z) - Determining the Difficulties of Students With Dyslexia via Virtual
Reality and Artificial Intelligence: An Exploratory Analysis [0.0]
The VRAIlexia project has been created to tackle this issue by proposing two different tools.
The first one has been created and is being distributed among dyslexic students in Higher Education Institutions, for the conduction of specific psychological and psychometric tests.
The second tool applies specific artificial intelligence algorithms to the data gathered via the application and other surveys.
arXiv Detail & Related papers (2024-01-15T20:26:09Z) - Student Mastery or AI Deception? Analyzing ChatGPT's Assessment
Proficiency and Evaluating Detection Strategies [1.633179643849375]
Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment.
This work investigates the performance of ChatGPT by evaluating it across three courses.
arXiv Detail & Related papers (2023-11-27T20:10:13Z) - Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing
Perspective [63.92197404447808]
Large language models (LLMs) have shown some human-like cognitive abilities.
We propose an adaptive testing framework for LLM evaluation.
This approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - Cognitive Diagnosis with Explicit Student Vector Estimation and
Unsupervised Question Matrix Learning [53.79108239032941]
We propose an explicit student vector estimation (ESVE) method to estimate the student vectors of DINA.
We also propose an unsupervised method called bidirectional calibration algorithm (HBCA) to label the Q-matrix automatically.
The experimental results on two real-world datasets show that ESVE-DINA outperforms the DINA model on accuracy and that the Q-matrix labeled automatically by HBCA can achieve performance comparable to that obtained with the manually labeled Q-matrix.
arXiv Detail & Related papers (2022-03-01T03:53:19Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - KANDINSKYPatterns -- An experimental exploration environment for Pattern
Analysis and Machine Intelligence [0.0]
We present KANDINSKYPatterns, named after the Russian artist Wassily Kandinksy, who made theoretical contributions to compositivity, i.e. that all perceptions consist of geometrically elementary individual components.
KANDINSKYPatterns have computationally controllable properties on the one hand, bringing ground truth, they are also easily distinguishable by human observers, i.e., controlled patterns can be described by both humans and algorithms.
arXiv Detail & Related papers (2021-02-28T14:09:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.