Related papers: Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination

Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination

URL: http://arxiv.org/abs/2306.06331v3
Date: Tue, 31 Oct 2023 08:21:20 GMT
Title: Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination
Authors: Xuan-Quy Dao and Ngoc-Bich Le
Abstract summary: The dataset included 250 questions divided into four levels: knowledge (K), comprehension (C), application (A), and high application (H) The study found that ChatGPT significantly succeeds in providing responses to questions on subjects including exponential and logarithmic functions, geometric progression, and arithmetic progression. ChatGPT dominated in the SAT Math competition with a success rate of $70%$, followed by VNHSGE mathematics ($58.8%)$.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This study offers a complete analysis of ChatGPT's mathematics abilities in responding to multiple-choice questions for the Vietnamese National High School Graduation Examination (VNHSGE) on a range of subjects and difficulty levels. The dataset included 250 questions divided into four levels: knowledge (K), comprehension (C), application (A), and high application (H), and it included ten themes that covered diverse mathematical concepts. The outcomes demonstrate that ChatGPT's performance varies depending on the difficulty level and subject. It performed best on questions at Level (K), with an accuracy rate of $83\%$; but, as the difficulty level rose, it scored poorly, with an accuracy rate of $10\%$. The study has also shown that ChatGPT significantly succeeds in providing responses to questions on subjects including exponential and logarithmic functions, geometric progression, and arithmetic progression. The study found that ChatGPT had difficulty correctly answering questions on topics including derivatives and applications, spatial geometry, and Oxyz spatial calculus. Additionally, this study contrasted ChatGPT outcomes with Vietnamese students in VNHSGE and in other math competitions. ChatGPT dominated in the SAT Math competition with a success rate of $70\%$, followed by VNHSGE mathematics ($58.8\%)$. However, its success rates were lower on other exams, such as AP Statistics, the GRE Quantitative, AMC 10, AMC 12, and AP Calculus BC. These results suggest that ChatGPT has the potential to be an effective teaching tool for mathematics, but more work is needed to enhance its handling of graphical data and address the challenges presented by questions that are getting more challenging.

Related papers

On the robustness of ChatGPT in teaching Korean Mathematics [0.0]
ChatGPT achieves 66.72% accuracy, correctly answering 391 out of 586 questions. Our findings show that ChatGPT's ratings align with educational theory and test-taker perspectives. Future research should address linguistic biases and enhance accuracy across diverse languages.
arXiv Detail & Related papers (2025-02-17T15:31:27Z)
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [90.07275414500154]
We observe significant performance drops on MATH-P-Hard across various models. We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills.
arXiv Detail & Related papers (2025-02-10T13:31:46Z)
Benchmarking ChatGPT on Algorithmic Reasoning [58.50071292008407]
We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs. We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems.
arXiv Detail & Related papers (2024-04-04T13:39:06Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z)
ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions [20.261452062585985]
Large language models (LLMs) have excelled in many NLP tasks involving logical and arithmetic reasoning. Our analysis is categorized into two main settings: context-aware and context-unaware. Our crawling results in TopicMath, a comprehensive and novel collection of pre-university math curriculums.
arXiv Detail & Related papers (2023-12-04T06:23:37Z)
ChatGPT Performance on Standardized Testing Exam -- A Proposed Strategy for Learners [0.0]
This study explores the problem solving capabilities of ChatGPT and its prospective applications in standardized test preparation, focusing on the GRE quantitative exam. We investigate how ChatGPT performs across various question types in the GRE quantitative domain, and how modifying question prompts impacts its accuracy.
arXiv Detail & Related papers (2023-09-25T20:25:29Z)
ChatGPT may excel in States Medical Licensing Examination but falters in basic Linear Algebra [2.3204178451683264]
The emergence of ChatGPT has been rapid, and although it has demonstrated positive impacts in certain domains, its influence is not universally advantageous. Our analysis focuses on ChatGPT's capabilities in Mathematics Education, particularly in teaching basic Linear Algebra.
arXiv Detail & Related papers (2023-06-23T15:19:29Z)
Can ChatGPT pass the Vietnamese National High School Graduation Examination? [0.0]
The study dataset included 30 essays in the literature test case and 1,700 multiple-choice questions designed for other subjects. ChatGPT was able to pass the examination with an average score of 6-7, demonstrating the technology's potential to revolutionize the educational landscape.
arXiv Detail & Related papers (2023-06-15T14:47:03Z)
ChatGPT Participates in a Computer Science Exam [16.665883787432858]
We ask ChatGPT to participate in an undergraduate computer science exam on ''Algorithms and Data Structures'' We hand-copied its answers onto an exam sheet, which was subsequently graded in a blind setup alongside those of 200 participating students. We find that ChatGPT narrowly passed the exam, obtaining 20.5 out of 40 points.
arXiv Detail & Related papers (2023-03-08T15:46:14Z)
Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries. We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models. We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks. We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.