Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of
Large Language Models with Misconceptions
- URL: http://arxiv.org/abs/2310.02439v1
- Date: Tue, 3 Oct 2023 21:19:50 GMT
- Title: Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of
Large Language Models with Misconceptions
- Authors: Naiming Liu, Shashank Sonkar, Zichao Wang, Simon Woodhead, Richard G.
Baraniuk
- Abstract summary: We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions.
Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception.
- Score: 28.759189115877028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose novel evaluations for mathematical reasoning capabilities of Large
Language Models (LLMs) based on mathematical misconceptions. Our primary
approach is to simulate LLMs as a novice learner and an expert tutor, aiming to
identify the incorrect answer to math question resulted from a specific
misconception and to recognize the misconception(s) behind an incorrect answer,
respectively. Contrary to traditional LLMs-based mathematical evaluations that
focus on answering math questions correctly, our approach takes inspirations
from principles in educational learning sciences. We explicitly ask LLMs to
mimic a novice learner by answering questions in a specific incorrect manner
based on incomplete knowledge; and to mimic an expert tutor by identifying
misconception(s) corresponding to an incorrect answer to a question. Using
simple grade-school math problems, our experiments reveal that, while LLMs can
easily answer these questions correctly, they struggle to identify 1) the
incorrect answer corresponding to specific incomplete knowledge
(misconceptions); 2) the misconceptions that explain particular incorrect
answers. Our study indicates new opportunities for enhancing LLMs' math
reasoning capabilities, especially on developing robust student simulation and
expert tutoring models in the educational applications such as intelligent
tutoring systems.
Related papers
- Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology [13.964263002704582]
We show that, even with the use of Chains of Thought prompts, mainstream LLMs have a high error rate when solving modified CRT problems.
Specifically, the average accuracy rate dropped by up to 50% compared to the original questions.
This finding challenges the belief that LLMs have genuine mathematical reasoning abilities comparable to humans.
arXiv Detail & Related papers (2024-10-19T05:01:56Z) - Not All LLM Reasoners Are Created Equal [58.236453890457476]
We study the depth of grade-school math problem-solving capabilities of LLMs.
We evaluate their performance on pairs of existing math word problems together.
arXiv Detail & Related papers (2024-10-02T17:01:10Z) - Reasoning with Large Language Models, a Survey [2.831296564800826]
This paper reviews the rapidly expanding field of prompt-based reasoning with LLMs.
Our taxonomy identifies different ways to generate, evaluate, and control multi-step reasoning.
We find that self-improvement, self-reflection, and some meta abilities of the reasoning processes are possible through the judicious use of prompts.
arXiv Detail & Related papers (2024-07-16T08:49:35Z) - MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models.
MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z) - Automate Knowledge Concept Tagging on Math Questions with LLMs [48.5585921817745]
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications.
Traditionally, these annotations have been conducted manually with help from pedagogical experts.
In this paper, we explore the automating the tagging task using Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-26T00:09:38Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2.
We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format.
Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z) - Three Questions Concerning the Use of Large Language Models to
Facilitate Mathematics Learning [4.376598435975689]
We discuss the challenges associated with employing large language models to enhance students' mathematical problem-solving skills.
LLMs can generate the wrong reasoning processes, and also exhibit difficulty in understanding the given questions' rationales when attempting to correct students' answers.
arXiv Detail & Related papers (2023-10-20T16:05:35Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.