ChatGPT Participates in a Computer Science Exam
- URL: http://arxiv.org/abs/2303.09461v2
- Date: Wed, 22 Mar 2023 11:30:41 GMT
- Title: ChatGPT Participates in a Computer Science Exam
- Authors: Sebastian Bordt, Ulrike von Luxburg
- Abstract summary: We ask ChatGPT to participate in an undergraduate computer science exam on ''Algorithms and Data Structures''
We hand-copied its answers onto an exam sheet, which was subsequently graded in a blind setup alongside those of 200 participating students.
We find that ChatGPT narrowly passed the exam, obtaining 20.5 out of 40 points.
- Score: 16.665883787432858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We asked ChatGPT to participate in an undergraduate computer science exam on
''Algorithms and Data Structures''. The program was evaluated on the entire
exam as posed to the students. We hand-copied its answers onto an exam sheet,
which was subsequently graded in a blind setup alongside those of 200
participating students. We find that ChatGPT narrowly passed the exam,
obtaining 20.5 out of 40 points. This impressive performance indicates that
ChatGPT can indeed succeed in challenging tasks like university exams. At the
same time, the questions in our exam are structurally similar to those of other
exams, solved homework problems, and teaching materials that can be found
online and might have been part of ChatGPT's training data. Therefore, it would
be inadequate to conclude from this experiment that ChatGPT has any
understanding of computer science. We also assess the improvements brought by
GPT-4. We find that GPT-4 would have obtained about 17\% more exam points than
GPT-3.5, reaching the performance of the average student. The transcripts of
our conversations with ChatGPT are available at
\url{https://github.com/tml-tuebingen/chatgpt-algorithm-exam}, and the entire
graded exam is in the appendix of this paper.
Related papers
- Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - A Study on the Vulnerability of Test Questions against ChatGPT-based
Cheating [14.113742357609285]
ChatGPT can answer text prompts fairly accurately, even performing very well on postgraduate-level questions.
Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating.
arXiv Detail & Related papers (2024-02-21T23:51:06Z) - Inappropriate Benefits and Identification of ChatGPT Misuse in
Programming Tests: A Controlled Experiment [0.0]
Students can ask ChatGPT to complete a programming task, generating a solution from other people's work without proper acknowledgment of the source(s)
We performed a controlled experiment measuring the inappropriate benefits of using ChatGPT in terms of completion time and programming performance.
arXiv Detail & Related papers (2023-08-11T06:42:29Z) - Can ChatGPT pass the Vietnamese National High School Graduation
Examination? [0.0]
The study dataset included 30 essays in the literature test case and 1,700 multiple-choice questions designed for other subjects.
ChatGPT was able to pass the examination with an average score of 6-7, demonstrating the technology's potential to revolutionize the educational landscape.
arXiv Detail & Related papers (2023-06-15T14:47:03Z) - Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and
Problem Solving: Evidence from the Vietnamese National High School Graduation
Examination [0.0]
The dataset included 250 questions divided into four levels: knowledge (K), comprehension (C), application (A), and high application (H)
The study found that ChatGPT significantly succeeds in providing responses to questions on subjects including exponential and logarithmic functions, geometric progression, and arithmetic progression.
ChatGPT dominated in the SAT Math competition with a success rate of $70%$, followed by VNHSGE mathematics ($58.8%)$.
arXiv Detail & Related papers (2023-06-10T02:01:02Z) - Can ChatGPT Pass An Introductory Level Functional Language Programming
Course? [2.3456295046913405]
This paper aims to explore how well ChatGPT can perform in an introductory-level functional language programming course.
Our comprehensive evaluation provides valuable insights into ChatGPT's impact from both student and instructor perspectives.
arXiv Detail & Related papers (2023-04-29T20:30:32Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - One Small Step for Generative AI, One Giant Leap for AGI: A Complete
Survey on ChatGPT in AIGC Era [95.2284704286191]
GPT-4 (a.k.a. ChatGPT plus) is one small step for generative AI (GAI) but one giant leap for artificial general intelligence (AGI)
Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage.
This work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges.
arXiv Detail & Related papers (2023-04-04T06:22:09Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.