Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and
Problem Solving: Evidence from the Vietnamese National High School Graduation
Examination
- URL: http://arxiv.org/abs/2306.06331v3
- Date: Tue, 31 Oct 2023 08:21:20 GMT
- Title: Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and
Problem Solving: Evidence from the Vietnamese National High School Graduation
Examination
- Authors: Xuan-Quy Dao and Ngoc-Bich Le
- Abstract summary: The dataset included 250 questions divided into four levels: knowledge (K), comprehension (C), application (A), and high application (H)
The study found that ChatGPT significantly succeeds in providing responses to questions on subjects including exponential and logarithmic functions, geometric progression, and arithmetic progression.
ChatGPT dominated in the SAT Math competition with a success rate of $70%$, followed by VNHSGE mathematics ($58.8%)$.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This study offers a complete analysis of ChatGPT's mathematics abilities in
responding to multiple-choice questions for the Vietnamese National High School
Graduation Examination (VNHSGE) on a range of subjects and difficulty levels.
The dataset included 250 questions divided into four levels: knowledge (K),
comprehension (C), application (A), and high application (H), and it included
ten themes that covered diverse mathematical concepts. The outcomes demonstrate
that ChatGPT's performance varies depending on the difficulty level and
subject. It performed best on questions at Level (K), with an accuracy rate of
$83\%$; but, as the difficulty level rose, it scored poorly, with an accuracy
rate of $10\%$. The study has also shown that ChatGPT significantly succeeds in
providing responses to questions on subjects including exponential and
logarithmic functions, geometric progression, and arithmetic progression. The
study found that ChatGPT had difficulty correctly answering questions on topics
including derivatives and applications, spatial geometry, and Oxyz spatial
calculus. Additionally, this study contrasted ChatGPT outcomes with Vietnamese
students in VNHSGE and in other math competitions. ChatGPT dominated in the SAT
Math competition with a success rate of $70\%$, followed by VNHSGE mathematics
($58.8\%)$. However, its success rates were lower on other exams, such as AP
Statistics, the GRE Quantitative, AMC 10, AMC 12, and AP Calculus BC. These
results suggest that ChatGPT has the potential to be an effective teaching tool
for mathematics, but more work is needed to enhance its handling of graphical
data and address the challenges presented by questions that are getting more
challenging.
Related papers
- Benchmarking ChatGPT on Algorithmic Reasoning [58.50071292008407]
We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs.
We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems.
arXiv Detail & Related papers (2024-04-04T13:39:06Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating
Pre-university Math Questions [20.261452062585985]
Large language models (LLMs) have excelled in many NLP tasks involving logical and arithmetic reasoning.
Our analysis is categorized into two main settings: context-aware and context-unaware.
Our crawling results in TopicMath, a comprehensive and novel collection of pre-university math curriculums.
arXiv Detail & Related papers (2023-12-04T06:23:37Z) - ChatGPT Performance on Standardized Testing Exam -- A Proposed Strategy
for Learners [0.0]
This study explores the problem solving capabilities of ChatGPT and its prospective applications in standardized test preparation, focusing on the GRE quantitative exam.
We investigate how ChatGPT performs across various question types in the GRE quantitative domain, and how modifying question prompts impacts its accuracy.
arXiv Detail & Related papers (2023-09-25T20:25:29Z) - ChatGPT may excel in States Medical Licensing Examination but falters in
basic Linear Algebra [2.3204178451683264]
The emergence of ChatGPT has been rapid, and although it has demonstrated positive impacts in certain domains, its influence is not universally advantageous.
Our analysis focuses on ChatGPT's capabilities in Mathematics Education, particularly in teaching basic Linear Algebra.
arXiv Detail & Related papers (2023-06-23T15:19:29Z) - Can ChatGPT pass the Vietnamese National High School Graduation
Examination? [0.0]
The study dataset included 30 essays in the literature test case and 1,700 multiple-choice questions designed for other subjects.
ChatGPT was able to pass the examination with an average score of 6-7, demonstrating the technology's potential to revolutionize the educational landscape.
arXiv Detail & Related papers (2023-06-15T14:47:03Z) - ChatGPT Participates in a Computer Science Exam [16.665883787432858]
We ask ChatGPT to participate in an undergraduate computer science exam on ''Algorithms and Data Structures''
We hand-copied its answers onto an exam sheet, which was subsequently graded in a blind setup alongside those of 200 participating students.
We find that ChatGPT narrowly passed the exam, obtaining 20.5 out of 40 points.
arXiv Detail & Related papers (2023-03-08T15:46:14Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - UniGeo: Unifying Geometry Logical Reasoning via Reformulating
Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks.
We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems.
We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.