Related papers: MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

URL: http://arxiv.org/abs/2503.18132v1
Date: Sun, 23 Mar 2025 16:25:08 GMT
Title: MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
Authors: Yibo Yan, Shen Wang, Jiahao Huo, Philip S. Yu, Xuming Hu, Qingsong Wen,
Abstract summary: We introduce MathAgent, a novel Mixture-of-Math-Agent framework designed specifically to address these challenges.<n>MathAgent decomposes error detection into three phases, each handled by a specialized agent.<n>We evaluate MathAgent on real-world educational data, demonstrating approximately 5% higher accuracy in error step identification.
Score: 53.325457460187046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mathematical error detection in educational settings presents a significant challenge for Multimodal Large Language Models (MLLMs), requiring a sophisticated understanding of both visual and textual mathematical content along with complex reasoning capabilities. Though effective in mathematical problem-solving, MLLMs often struggle with the nuanced task of identifying and categorizing student errors in multimodal mathematical contexts. Therefore, we introduce MathAgent, a novel Mixture-of-Math-Agent framework designed specifically to address these challenges. Our approach decomposes error detection into three phases, each handled by a specialized agent: an image-text consistency validator, a visual semantic interpreter, and an integrative error analyzer. This architecture enables more accurate processing of mathematical content by explicitly modeling relationships between multimodal problems and student solution steps. We evaluate MathAgent on real-world educational data, demonstrating approximately 5% higher accuracy in error step identification and 3% improvement in error categorization compared to baseline models. Besides, MathAgent has been successfully deployed in an educational platform that has served over one million K-12 students, achieving nearly 90% student satisfaction while generating significant cost savings by reducing manual error detection.

Related papers

StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error [60.82371607870152]
We propose a novel mathematical process evaluation agent based on Tree-of-Error, called StepMathAgent. StepMathAgent incorporates four internal core operations: logical step segmentation, step scoring, score aggregation and error tree generation, along with four external extension modules: difficulty calibration, simplicity evaluation, validation and format assessment. Experiments on StepMathBench show that our proposed StepMathAgent outperforms all state-of-the-art methods, demonstrating human-aligned evaluation preferences and broad applicability to various scenarios.
arXiv Detail & Related papers (2025-03-13T07:02:53Z)
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs [13.756898876556455]
We propose a novel system, MathMistake Checker, to automate step-by-step mistake finding in mathematical problems with lengthy answers. The system aims to simplify grading, increase efficiency, and enhance learning experiences from a pedagogical perspective.
arXiv Detail & Related papers (2025-03-06T10:19:01Z)
Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework [64.83955753606443]
Math Word Problems serve as a crucial benchmark for evaluating Large Language Models' reasoning abilities.<n>Current error classification methods rely on static and predefined categories.<n>We introduce MWPES-300K, a comprehensive dataset containing 304,865 error samples.
arXiv Detail & Related papers (2025-01-26T16:17:57Z)
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection [60.297079601066784]
We introduce ErrorRadar, the first benchmark designed to assess MLLMs' capabilities in error detection. ErrorRadar evaluates two sub-tasks: error step identification and error categorization. It consists of 2,500 high-quality multimodal K-12 mathematical problems, collected from real-world student interactions. Results indicate significant challenges still remain, as GPT-4o with best performance is still around 10% behind human evaluation.
arXiv Detail & Related papers (2024-10-06T14:59:09Z)
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning [5.9767694994869425]
Multimodal Large Language Models (MLLMs) excel in solving text-based mathematical problems. They struggle with mathematical diagrams since they are primarily trained on natural scene images. We propose Math-PUMA, a methodology focused on Progressive Upward Multimodal Alignment.
arXiv Detail & Related papers (2024-08-16T10:11:05Z)
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks [34.09857430966818]
We introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks. We conduct fine-tuning experiments with three prominent large language models: LLaMA-2, WizardMath, and MAmmoTH. Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems.
arXiv Detail & Related papers (2024-04-19T08:45:42Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
We introduce MATH, a dataset of 12,500 challenging competition mathematics problems. Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
arXiv Detail & Related papers (2021-03-05T18:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.