Substance Beats Style: Why Beginning Students Fail to Code with LLMs
- URL: http://arxiv.org/abs/2410.19792v1
- Date: Tue, 15 Oct 2024 20:36:30 GMT
- Title: Substance Beats Style: Why Beginning Students Fail to Code with LLMs
- Authors: Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson,
- Abstract summary: Existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks.
This paper explores two competing hypotheses about the cause of student-LLM miscommunication.
- Score: 3.4817709155395327
- License:
- Abstract: Although LLMs are increasing the productivity of professional programmers, existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks. Why is this the case? This paper explores two competing hypotheses about the cause of student-LLM miscommunication: (1) students simply lack the technical vocabulary needed to write good prompts, and (2) students do not understand the extent of information that LLMs need to solve code generation tasks. We study (1) with a causal intervention experiment on technical vocabulary and (2) by analyzing graphs that abstract how students edit prompts and the different failures that they encounter. We find that substance beats style: a poor grasp of technical vocabulary is merely correlated with prompt failure; that the information content of prompts predicts success; that students get stuck making trivial edits; and more. Our findings have implications for the use of LLMs in programming education, and for efforts to make computing more accessible with LLMs.
Related papers
- Not All LLM Reasoners Are Created Equal [58.236453890457476]
We study the depth of grade-school math problem-solving capabilities of LLMs.
We evaluate their performance on pairs of existing math word problems together.
arXiv Detail & Related papers (2024-10-02T17:01:10Z) - Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice [1.106787864231365]
We show that GPT-4 generated error messages outperformed conventional compiler error messages in only 1 of the 6 tasks.
Despite promising evidence on synthetic benchmarks, we found that GPT-4 generated error messages outperformed conventional compiler error messages in only 1 of the 6 tasks.
arXiv Detail & Related papers (2024-09-27T11:45:56Z) - Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs)
We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem.
Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - LinkGPT: Teaching Large Language Models To Predict Missing Links [23.57145845001286]
Large Language Models (LLMs) have shown promising results on various language and vision tasks.
Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs)
arXiv Detail & Related papers (2024-06-07T04:54:36Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs) [0.7545833157486899]
We introduce a novel writing method called Probing Chain-of-Thought (ProCoT)
It potentially prevents students from cheating using a Large Language Model (LLM)
We conduct studies with ProCoT in two different courses with 65 students.
arXiv Detail & Related papers (2023-12-15T14:01:46Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Exploring the Responses of Large Language Models to Beginner
Programmers' Help Requests [1.8260333137469122]
We assess how good large language models (LLMs) are at identifying issues in problematic code that students request help on.
We collected a sample of help requests and code from an online programming course.
arXiv Detail & Related papers (2023-06-09T07:19:43Z) - StudentEval: A Benchmark of Student-Written Prompts for Large Language
Models of Code [2.087827281461409]
StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming.
We analyze the prompts and find significant variation in students' prompting techniques.
arXiv Detail & Related papers (2023-06-07T16:03:55Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.