Related papers: ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems

ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems

URL: http://arxiv.org/abs/2309.08881v1
Date: Sat, 16 Sep 2023 05:19:39 GMT
Title: ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems
Authors: Tanuj Kumar and Mikhail A. Kats
Abstract summary: We evaluated ChatGPT 3.5, 4, and 4 with Code Interpreter on a set of college-level engineering-math and electromagnetism problems. ChatGPT-4 with Code Interpreter was able to satisfactorily solve most problems we tested most of the time.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We evaluated ChatGPT 3.5, 4, and 4 with Code Interpreter on a set of college-level engineering-math and electromagnetism problems, such as those often given to sophomore electrical engineering majors. We selected a set of 13 problems, and had ChatGPT solve them multiple times, using a fresh instance (chat) each time. We found that ChatGPT-4 with Code Interpreter was able to satisfactorily solve most problems we tested most of the time -- a major improvement over the performance of ChatGPT-4 (or 3.5) without Code Interpreter. The performance of ChatGPT was observed to be somewhat stochastic, and we found that solving the same problem N times in new ChatGPT instances and taking the most-common answer was an effective strategy. Based on our findings and observations, we provide some recommendations for instructors and students of classes at this level.

Related papers

Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of Different Complexity Levels: An Empirical Analysis [6.123324869194196]
We assess the performance of ChatGPT's GPT-3.5-turbo model on LeetCode. We show ChatGPT solves fewer problems as difficulty rises. Second, prompt engineering improves ChatGPT's performance. Third, ChatGPT performs better in popular languages like Python, Java, and C++ than in less common ones like Elixir, Erlang, and Racket.
arXiv Detail & Related papers (2024-11-12T04:01:09Z)
Benchmarking ChatGPT on Algorithmic Reasoning [58.50071292008407]
We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs. We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems.
arXiv Detail & Related papers (2024-04-04T13:39:06Z)
Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z)
Unreflected Acceptance -- Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education [4.014729339820806]
The impact of large language models (LLMs) on sensitive areas of everyday life, such as education, remains unclear. Our work focuses on higher physics education and examines problem solving strategies.
arXiv Detail & Related papers (2023-08-21T16:14:34Z)
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures [0.6990493129893112]
We evaluate ChatGPT's ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. We look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations.
arXiv Detail & Related papers (2023-07-10T08:20:34Z)
Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard [68.8204255655161]
We use 30 questions that are clear, without any ambiguities, fully described with plain text only, and have a unique, well defined correct answer. The answers are recorded and discussed, highlighting their strengths and weaknesses. It was found that ChatGPT-4 outperforms ChatGPT-3.5 in both sets of questions.
arXiv Detail & Related papers (2023-05-30T11:18:05Z)
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks [2.084078990567849]
ChatGPT (Chat Generative Pre-trained Transformer) launched by OpenAI on November 30, 2022. In this study, we explore how ChatGPT can be used to help with common software engineering tasks.
arXiv Detail & Related papers (2023-05-26T11:29:06Z)
When do you need Chain-of-Thought Prompting for ChatGPT? [87.45382888430643]
Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models(LLMs) It is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT.
arXiv Detail & Related papers (2023-04-06T17:47:29Z)
AI and the FCI: Can ChatGPT Project an Understanding of Introductory Physics? [0.0]
ChatGPT is a groundbreaking AI interface built on a large language model that was trained on an enormous corpus of human text to emulate human conversation. We present a preliminary analysis of how two versions of ChatGPT fare in the field of first-semester university physics.
arXiv Detail & Related papers (2023-03-02T08:43:11Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.