ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating
Pre-university Math Questions
- URL: http://arxiv.org/abs/2312.01661v2
- Date: Wed, 28 Feb 2024 04:33:33 GMT
- Title: ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating
Pre-university Math Questions
- Authors: Phuoc Pham Van Long, Duc Anh Vu, Nhat M. Hoang, Xuan Long Do, Anh Tuan
Luu
- Abstract summary: Large language models (LLMs) have excelled in many NLP tasks involving logical and arithmetic reasoning.
Our analysis is categorized into two main settings: context-aware and context-unaware.
Our crawling results in TopicMath, a comprehensive and novel collection of pre-university math curriculums.
- Score: 20.261452062585985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical questioning is crucial for assessing students problem-solving
skills. Since manually creating such questions requires substantial effort,
automatic methods have been explored. Existing state-of-the-art models rely on
fine-tuning strategies and struggle to generate questions that heavily involve
multiple steps of logical and arithmetic reasoning. Meanwhile, large language
models(LLMs) such as ChatGPT have excelled in many NLP tasks involving logical
and arithmetic reasoning. Nonetheless, their applications in generating
educational questions are underutilized, especially in the field of
mathematics. To bridge this gap, we take the first step to conduct an in-depth
analysis of ChatGPT in generating pre-university math questions. Our analysis
is categorized into two main settings: context-aware and context-unaware. In
the context-aware setting, we evaluate ChatGPT on existing math
question-answering benchmarks covering elementary, secondary, and ternary
classes. In the context-unaware setting, we evaluate ChatGPT in generating math
questions for each lesson from pre-university math curriculums that we crawl.
Our crawling results in TopicMath, a comprehensive and novel collection of
pre-university math curriculums collected from 121 math topics and 428 lessons
from elementary, secondary, and tertiary classes. Through this analysis, we aim
to provide insight into the potential of ChatGPT as a math questioner.
Related papers
- FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI [2.0608396919601493]
FrontierMath is a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians.
Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.
As AI systems advance toward expert-level mathematical abilities, FrontierMath offers a rigorous testbed that quantifies their progress.
arXiv Detail & Related papers (2024-11-07T17:07:35Z) - Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist [46.670206614087334]
We argue that if a model really understands a problem, it should be robustly applied across a diverse array of tasks.
MathCheck is a well-designed checklist for testing task generalization and reasoning.
MathCheck better reflects true mathematical abilities and represents mathematical intelligence more linearly.
arXiv Detail & Related papers (2024-07-11T17:58:58Z) - MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models.
MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z) - FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models [44.63505885248145]
FineMath is a fine-grained mathematical evaluation benchmark dataset for assessing Chinese Large Language Models (LLMs)
FineMath is created to cover the major key mathematical concepts taught in elementary school math, which are divided into 17 categories of math word problems.
All the 17 categories of math word problems are manually annotated with their difficulty levels according to the number of reasoning steps required to solve these problems.
arXiv Detail & Related papers (2024-03-12T15:32:39Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - ChatGPT may excel in States Medical Licensing Examination but falters in
basic Linear Algebra [2.3204178451683264]
The emergence of ChatGPT has been rapid, and although it has demonstrated positive impacts in certain domains, its influence is not universally advantageous.
Our analysis focuses on ChatGPT's capabilities in Mathematics Education, particularly in teaching basic Linear Algebra.
arXiv Detail & Related papers (2023-06-23T15:19:29Z) - Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and
Problem Solving: Evidence from the Vietnamese National High School Graduation
Examination [0.0]
The dataset included 250 questions divided into four levels: knowledge (K), comprehension (C), application (A), and high application (H)
The study found that ChatGPT significantly succeeds in providing responses to questions on subjects including exponential and logarithmic functions, geometric progression, and arithmetic progression.
ChatGPT dominated in the SAT Math competition with a success rate of $70%$, followed by VNHSGE mathematics ($58.8%)$.
arXiv Detail & Related papers (2023-06-10T02:01:02Z) - Towards a Holistic Understanding of Mathematical Questions with
Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo.
We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes.
Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z) - A Neural Network Solves and Generates Mathematics Problems by Program
Synthesis: Calculus, Differential Equations, Linear Algebra, and More [8.437319139670116]
We turn questions into programming tasks, automatically generate programs, and then execute them.
This is the first work to automatically solve, grade, and generate university-level Mathematics course questions at scale.
arXiv Detail & Related papers (2021-12-31T18:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.