A Benchmark for Math Misconceptions: Bridging Gaps in Middle School Algebra with AI-Supported Instruction
- URL: http://arxiv.org/abs/2412.03765v1
- Date: Wed, 04 Dec 2024 23:10:29 GMT
- Title: A Benchmark for Math Misconceptions: Bridging Gaps in Middle School Algebra with AI-Supported Instruction
- Authors: Otero Nancy, Druga Stefania, Lan Andrew,
- Abstract summary: This study introduces an evaluation benchmark for middle school algebra to be used in artificial intelligence based educational platforms.
The data set comprises 55 misconceptions about algebra, common errors, and 220 diagnostic examples.
Four out of five educators expressed interest in using the data set with AI to diagnose student misconceptions or train teachers.
- Score: 0.0
- License:
- Abstract: This study introduces an evaluation benchmark for middle school algebra to be used in artificial intelligence(AI) based educational platforms. The goal is to support the design of AI systems that can enhance learner conceptual understanding of algebra by taking into account their current level of algebra comprehension. The data set comprises 55 misconceptions about algebra, common errors, and 220 diagnostic examples identified in previous peer-reviewed studies. We provide an example application using a large language model, observing a range of precision and recall scores depending on the topic and experimental setup that reaches 83.9% when including educator feedback and restricting it by topic. We found that topics such as ratios and proportions prove as difficult for LLMs as they are for students. We included a human assessment of LLMs results and feedback from five middle school math educators on the clarity and occurrence of misconceptions in the dataset and the potential use of AI in conjunction with the dataset. Most educators (80% or more) indicated that they encounter these misconceptions among their students, suggesting the relevance of the data set to teaching middle school algebra. Despite varying familiarity with AI tools, four out of five educators expressed interest in using the data set with AI to diagnose student misconceptions or train teachers. The results emphasize the importance of topic-constrained testing, the need for multimodal approaches, and the relevance of human expertise to gain practical insights when using AI for human learning.
Related papers
- MNIST-Fraction: Enhancing Math Education with AI-Driven Fraction Detection and Analysis [3.54834102467122]
We present a novel contribution to the field of mathematics education through the development of MNIST-Fraction.
MNIST-Fraction is a dataset inspired by the renowned MNIST, specifically tailored for the recognition and understanding of handwritten math fractions.
Our approach is the utilization of deep learning, specifically Convolutional Neural Networks (CNNs) for the recognition and understanding of handwritten math fractions.
arXiv Detail & Related papers (2024-12-11T18:56:28Z) - Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students [53.20318273452059]
Large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education.
Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs.
We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.
arXiv Detail & Related papers (2024-11-27T19:19:34Z) - DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions [42.148511874019256]
We introduce DiVERT, a novel variational approach that learns an interpretable representation of errors behind distractors in math multiple-choice questions (MCQs)
We show that DiVERT, despite using a base open-source LLM with 7B parameters, outperforms state-of-the-art approaches using GPT-4o on downstream distractor generation.
We also conduct a human evaluation with math educators and find that DiVERT leads to error labels that are of comparable quality to human-authored ones.
arXiv Detail & Related papers (2024-06-27T17:37:31Z) - CourseAssist: Pedagogically Appropriate AI Tutor for Computer Science Education [1.052788652996288]
This poster introduces CourseAssist, a novel LLM-based tutoring system tailored for computer science education.
Unlike generic LLM systems, CourseAssist uses retrieval-augmented generation, user intent classification, and question decomposition to align AI responses with specific course materials and learning objectives.
arXiv Detail & Related papers (2024-05-01T20:43:06Z) - Generative AI in Education: A Study of Educators' Awareness, Sentiments, and Influencing Factors [2.217351976766501]
This study delves into university instructors' experiences and attitudes toward AI language models.
We find no correlation between teaching style and attitude toward generative AI.
While CS educators show far more confidence in their technical understanding of generative AI tools, they show no more confidence in their ability to detect AI-generated work.
arXiv Detail & Related papers (2024-03-22T19:21:29Z) - Determining the Difficulties of Students With Dyslexia via Virtual
Reality and Artificial Intelligence: An Exploratory Analysis [0.0]
The VRAIlexia project has been created to tackle this issue by proposing two different tools.
The first one has been created and is being distributed among dyslexic students in Higher Education Institutions, for the conduction of specific psychological and psychometric tests.
The second tool applies specific artificial intelligence algorithms to the data gathered via the application and other surveys.
arXiv Detail & Related papers (2024-01-15T20:26:09Z) - Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs)
We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics.
We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z) - Lila: A Unified Benchmark for Mathematical Reasoning [59.97570380432861]
LILA is a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions.
We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs.
We introduce BHASKARA, a general-purpose mathematical reasoning model trained on LILA.
arXiv Detail & Related papers (2022-10-31T17:41:26Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Personalized Education in the AI Era: What to Expect Next? [76.37000521334585]
The objective of personalized learning is to design an effective knowledge acquisition track that matches the learner's strengths and bypasses her weaknesses to meet her desired goal.
In recent years, the boost of artificial intelligence (AI) and machine learning (ML) has unfolded novel perspectives to enhance personalized education.
arXiv Detail & Related papers (2021-01-19T12:23:32Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.