On the robustness of ChatGPT in teaching Korean Mathematics
- URL: http://arxiv.org/abs/2502.11915v1
- Date: Mon, 17 Feb 2025 15:31:27 GMT
- Title: On the robustness of ChatGPT in teaching Korean Mathematics
- Authors: Phuong-Nam Nguyen, Quang Nguyen-The, An Vu-Minh, Diep-Anh Nguyen, Xuan-Lam Pham,
- Abstract summary: ChatGPT achieves 66.72% accuracy, correctly answering 391 out of 586 questions.
Our findings show that ChatGPT's ratings align with educational theory and test-taker perspectives.
Future research should address linguistic biases and enhance accuracy across diverse languages.
- Score: 0.0
- License:
- Abstract: ChatGPT, an Artificial Intelligence model, has the potential to revolutionize education. However, its effectiveness in solving non-English questions remains uncertain. This study evaluates ChatGPT's robustness using 586 Korean mathematics questions. ChatGPT achieves 66.72% accuracy, correctly answering 391 out of 586 questions. We also assess its ability to rate mathematics questions based on eleven criteria and perform a topic analysis. Our findings show that ChatGPT's ratings align with educational theory and test-taker perspectives. While ChatGPT performs well in question classification, it struggles with non-English contexts, highlighting areas for improvement. Future research should address linguistic biases and enhance accuracy across diverse languages. Domain-specific optimizations and multilingual training could improve ChatGPT's role in personalized education.
Related papers
- Evaluating ChatGPT as a Question Answering System: A Comprehensive
Analysis and Comparison with Existing Models [0.0]
This article scrutinizes ChatGPT as a Question Answering System (QAS)
The primary focus is on evaluating ChatGPT's proficiency in extracting responses from provided paragraphs.
The evaluation highlights hallucinations, where ChatGPT provides responses to questions without available answers in the provided context.
arXiv Detail & Related papers (2023-12-11T08:49:18Z) - ChatGPT Performance on Standardized Testing Exam -- A Proposed Strategy
for Learners [0.0]
This study explores the problem solving capabilities of ChatGPT and its prospective applications in standardized test preparation, focusing on the GRE quantitative exam.
We investigate how ChatGPT performs across various question types in the GRE quantitative domain, and how modifying question prompts impacts its accuracy.
arXiv Detail & Related papers (2023-09-25T20:25:29Z) - Can ChatGPT pass the Vietnamese National High School Graduation
Examination? [0.0]
The study dataset included 30 essays in the literature test case and 1,700 multiple-choice questions designed for other subjects.
ChatGPT was able to pass the examination with an average score of 6-7, demonstrating the technology's potential to revolutionize the educational landscape.
arXiv Detail & Related papers (2023-06-15T14:47:03Z) - Transformative Effects of ChatGPT on Modern Education: Emerging Era of
AI Chatbots [36.760677949631514]
ChatGPT was released to provide coherent and useful replies based on analysis of large volumes of data.
Our preliminary evaluation concludes that ChatGPT performed differently in each subject area including finance, coding and maths.
There are clear drawbacks in its use, such as the possibility of producing inaccurate or false data.
Academic regulations and evaluation practices need to be updated, should ChatGPT be used as a tool in education.
arXiv Detail & Related papers (2023-05-25T17:35:57Z) - ChatGPT in the Classroom: An Analysis of Its Strengths and Weaknesses
for Solving Undergraduate Computer Science Questions [5.962828109329824]
ChatGPT is an AI language model developed by OpenAI that can understand and generate human-like text.
There is concern that students may leverage ChatGPT to complete take-home assignments and exams and obtain favorable grades without genuinely acquiring knowledge.
arXiv Detail & Related papers (2023-04-28T17:26:32Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation.
It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries.
However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.