ChatGPT Performance on Standardized Testing Exam -- A Proposed Strategy
for Learners
- URL: http://arxiv.org/abs/2309.14519v1
- Date: Mon, 25 Sep 2023 20:25:29 GMT
- Title: ChatGPT Performance on Standardized Testing Exam -- A Proposed Strategy
for Learners
- Authors: Umer Farooq, Saira Anwar
- Abstract summary: This study explores the problem solving capabilities of ChatGPT and its prospective applications in standardized test preparation, focusing on the GRE quantitative exam.
We investigate how ChatGPT performs across various question types in the GRE quantitative domain, and how modifying question prompts impacts its accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study explores the problem solving capabilities of ChatGPT and its
prospective applications in standardized test preparation, focusing on the GRE
quantitative exam. Prior research has shown great potential for the utilization
of ChatGPT for academic purposes in revolutionizing the approach to studying
across various disciplines. We investigate how ChatGPT performs across various
question types in the GRE quantitative domain, and how modifying question
prompts impacts its accuracy. More specifically this study addressed two
research questions: 1. How does ChatGPT perform in answering GRE-based
quantitative questions across various content areas? 2. How does the accuracy
of ChatGPT vary with modifying the question prompts? The dataset consisting of
100 randomly selected GRE quantitative questions was collected from the ETS
official guide to GRE test preparation. We used quantitative evaluation to
answer our first research question, and t-test to examine the statistical
association between prompt modification and ChatGPT's accuracy. Results show a
statistical improvement in the ChatGPT's accuracy after applying instruction
priming and contextual prompts to the original questions. ChatGPT showed 84%
accuracy with the modified prompts compared to 69% with the original data. The
study discusses the areas where ChatGPT struggled with certain questions and
how modifications can be helpful for preparing for standardized tests like GRE
and provides future directions for prompt modifications.
Related papers
- A Study on the Vulnerability of Test Questions against ChatGPT-based
Cheating [14.113742357609285]
ChatGPT can answer text prompts fairly accurately, even performing very well on postgraduate-level questions.
Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating.
arXiv Detail & Related papers (2024-02-21T23:51:06Z) - Evaluating ChatGPT as a Question Answering System: A Comprehensive
Analysis and Comparison with Existing Models [0.0]
This article scrutinizes ChatGPT as a Question Answering System (QAS)
The primary focus is on evaluating ChatGPT's proficiency in extracting responses from provided paragraphs.
The evaluation highlights hallucinations, where ChatGPT provides responses to questions without available answers in the provided context.
arXiv Detail & Related papers (2023-12-11T08:49:18Z) - Comparative Analysis of ChatGPT, GPT-4, and Microsoft Bing Chatbots for GRE Test [0.0]
This research paper presents an analysis of how well three artificial intelligence chatbots: Bing, ChatGPT, and GPT-4, perform when answering questions from standardized tests.
A total of 137 questions with different forms of quantitative reasoning and 157 questions with verbal categories were used to assess their capabilities.
arXiv Detail & Related papers (2023-11-26T05:27:35Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - Performance of ChatGPT on USMLE: Unlocking the Potential of Large
Language Models for AI-Assisted Medical Education [0.0]
This study determined how reliable ChatGPT can be for answering complex medical and clinical questions.
The paper evaluated the obtained results using a 2-way ANOVA and posthoc analysis.
ChatGPT-generated answers were found to be more context-oriented than regular Google search results.
arXiv Detail & Related papers (2023-06-30T19:53:23Z) - Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and
Problem Solving: Evidence from the Vietnamese National High School Graduation
Examination [0.0]
The dataset included 250 questions divided into four levels: knowledge (K), comprehension (C), application (A), and high application (H)
The study found that ChatGPT significantly succeeds in providing responses to questions on subjects including exponential and logarithmic functions, geometric progression, and arithmetic progression.
ChatGPT dominated in the SAT Math competition with a success rate of $70%$, followed by VNHSGE mathematics ($58.8%)$.
arXiv Detail & Related papers (2023-06-10T02:01:02Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.