Related papers: Can ChatGPT Pass a Theory of Computing Course?

Can ChatGPT Pass a Theory of Computing Course?

URL: http://arxiv.org/abs/2407.07757v1
Date: Wed, 10 Jul 2024 15:34:06 GMT
Title: Can ChatGPT Pass a Theory of Computing Course?
Authors: Matei A. Golesteanu, Garrett B. Vowinkel, Ryan E. Dougherty,
Abstract summary: We evaluate ChatGPT's ability to pass our own ToC course's exams. We create a database of sample ToC questions and responses to accommodate other ToC offerings' choices for topics and structure.
Score: 0.22940141855172028
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have had considerable difficulty when prompted with mathematical questions, especially those within theory of computing (ToC) courses. In this paper, we detail two experiments regarding our own ToC course and the ChatGPT LLM. For the first, we evaluated ChatGPT's ability to pass our own ToC course's exams. For the second, we created a database of sample ToC questions and responses to accommodate other ToC offerings' choices for topics and structure. We scored each of ChatGPT's outputs on these questions. Overall, we determined that ChatGPT can pass our ToC course, and is adequate at understanding common formal definitions and answering "simple"-style questions, e.g., true/false and multiple choice. However, ChatGPT often makes nonsensical claims in open-ended responses, such as proofs.

Related papers

A Study on the Vulnerability of Test Questions against ChatGPT-based Cheating [14.113742357609285]
ChatGPT can answer text prompts fairly accurately, even performing very well on postgraduate-level questions. Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating.
arXiv Detail & Related papers (2024-02-21T23:51:06Z)
Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z)
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions [7.065853028825656]
We conducted the first in-depth analysis of ChatGPT answers to programming questions on Stack Overflow. We examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT answers. Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose.
arXiv Detail & Related papers (2023-08-04T13:23:20Z)
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures [0.6990493129893112]
We evaluate ChatGPT's ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. We look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations.
arXiv Detail & Related papers (2023-07-10T08:20:34Z)
Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard [68.8204255655161]
We use 30 questions that are clear, without any ambiguities, fully described with plain text only, and have a unique, well defined correct answer. The answers are recorded and discussed, highlighting their strengths and weaknesses. It was found that ChatGPT-4 outperforms ChatGPT-3.5 in both sets of questions.
arXiv Detail & Related papers (2023-05-30T11:18:05Z)
Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study [51.079100495163736]
This paper systematically inspects ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing. ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations. Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures.
arXiv Detail & Related papers (2023-05-15T07:14:41Z)
When do you need Chain-of-Thought Prompting for ChatGPT? [87.45382888430643]
Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models(LLMs) It is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT.
arXiv Detail & Related papers (2023-04-06T17:47:29Z)
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models [49.52083248451775]
Large language models (LLMs) have made significant progress in NLP. We specifically focus on ChatGPT, a widely used and easily accessible LLM. We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities.
arXiv Detail & Related papers (2023-03-29T03:05:43Z)
Analyzing ChatGPT's Aptitude in an Introductory Computer Engineering Course [6.531546527140474]
ChatGPT is a tool that is able to generate plausible and human-sounding text answers to various questions. This work assesses ChatGPT's aptitude in answering quizzes, homework, exam, and laboratory questions in an introductory computer engineering course.
arXiv Detail & Related papers (2023-03-13T16:22:43Z)
ChatGPT Participates in a Computer Science Exam [16.665883787432858]
We ask ChatGPT to participate in an undergraduate computer science exam on ''Algorithms and Data Structures'' We hand-copied its answers onto an exam sheet, which was subsequently graded in a blind setup alongside those of 200 participating students. We find that ChatGPT narrowly passed the exam, obtaining 20.5 out of 40 points.
arXiv Detail & Related papers (2023-03-08T15:46:14Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.