Related papers: ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)

ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)

URL: http://arxiv.org/abs/2312.09801v2
Date: Wed, 1 May 2024 08:45:38 GMT
Title: ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)
Authors: Tosin Adewumi, Lama Alkhaled, Claudia Buck, Sergio Hernandez, Saga Brilioth, Mkpe Kekung, Yelvin Ragimov, Elisa Barney,
Abstract summary: We introduce a novel writing method called Probing Chain-of-Thought (ProCoT) It potentially prevents students from cheating using a Large Language Model (LLM) We conduct studies with ProCoT in two different courses with 65 students.
Score: 0.7545833157486899
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel writing method called Probing Chain-of-Thought (ProCoT), which potentially prevents students from cheating using a Large Language Model (LLM), such as ChatGPT, while enhancing their active learning. LLMs have disrupted education and many other fields. For fear of students cheating, many have resorted to banning their use. These LLMs are also known for hallucinations. We conduct studies with ProCoT in two different courses with 65 students. The students in each course were asked to prompt an LLM of their choice with one question from a set of four and required to affirm or refute statements in the LLM output by using peer-reviewed references. The results show two things: (1) ProCoT stimulates creative/critical thinking and writing of students through engagement with LLMs when we compare the LLM-only output to ProCoT output and (2) ProCoT can prevent cheating because of clear limitations in existing LLMs, particularly ChatGPT, when we compare students' ProCoT output to LLM ProCoT output. We also discover that most students prefer to give answers in fewer words than LLMs, which are typically verbose. The average word counts for students in the first course, ChatGPT (v3.5), and Phind (v8) are 208, 391 and 383, respectively.

Related papers

Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny [79.56218230251953]
Students in computing education increasingly use large language models (LLMs) such as ChatGPT.<n>This paper investigates how students interact with an LLM when solving formal verification exercises in Dafny.
arXiv Detail & Related papers (2025-06-27T16:34:13Z)
Substance Beats Style: Why Beginning Students Fail to Code with LLMs [3.4817709155395327]
Existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks. This paper explores two competing hypotheses about the cause of student-LLM miscommunication.
arXiv Detail & Related papers (2024-10-15T20:36:30Z)
CS1-LLM: Integrating LLMs into CS1 Instruction [0.6282171844772422]
This experience report describes a CS1 course at a large research-intensive university that fully embraces the use of Large Language Models. To incorporate the LLMs, the course was intentionally altered to reduce emphasis on syntax and writing code from scratch. Students were given three large, open-ended projects in three separate domains that allowed them to showcase their creativity.
arXiv Detail & Related papers (2024-04-17T14:44:28Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers. We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z)
Can Separators Improve Chain-of-Thought Prompting? [10.398343318429367]
Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs) Inspired by human cognition, we introduce COT-SEP, a method that strategically employs separators at the end of each exemplar in CoT prompting.
arXiv Detail & Related papers (2024-02-16T12:46:16Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests [1.8260333137469122]
We assess how good large language models (LLMs) are at identifying issues in problematic code that students request help on. We collected a sample of help requests and code from an online programming course.
arXiv Detail & Related papers (2023-06-09T07:19:43Z)
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code [2.087827281461409]
StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. We analyze the prompts and find significant variation in students' prompting techniques.
arXiv Detail & Related papers (2023-06-07T16:03:55Z)
Statistical Knowledge Assessment for Large Language Models [79.07989821512128]
Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers? We propose KaRR, a statistical approach to assess factual knowledge for LLMs. Our results reveal that the knowledge in LLMs with the same backbone architecture adheres to the scaling law, while tuning on instruction-following data sometimes compromises the model's capability to generate factually correct text reliably.
arXiv Detail & Related papers (2023-05-17T18:54:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.