"I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code
- URL: http://arxiv.org/abs/2504.19037v1
- Date: Sat, 26 Apr 2025 22:12:16 GMT
- Title: "I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code
- Authors: Yangtian Zi, Luisa Li, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman,
- Abstract summary: This paper measures how well beginners comprehend large language models (LLMs) generated code.<n>Key challenges include barriers for non-native English speakers, unfamiliarity with Python syntax, and automation bias.<n>Our results show a low per-task success rate of 32.5%, with indiscriminate struggles across demographic populations.
- Score: 3.125508434341366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are being increasingly adopted for programming work. Prior work shows that while LLMs accelerate task completion for professional programmers, beginning programmers struggle to prompt models effectively. However, prompting is just half of the code generation process -- when code is generated, it must be read, evaluated, and integrated (or rejected). How accessible are these tasks for beginning programmers? This paper measures how well beginners comprehend LLM-generated code and explores the challenges students face in judging code correctness. We compare how well students understand natural language descriptions of functions and LLM-generated implementations, studying 32 CS1 students on 160 task instances. Our results show a low per-task success rate of 32.5\%, with indiscriminate struggles across demographic populations. Key challenges include barriers for non-native English speakers, unfamiliarity with Python syntax, and automation bias. Our findings highlight the barrier that code comprehension presents to beginning programmers seeking to write code with LLMs.
Related papers
- How Accurately Do Large Language Models Understand Code? [4.817546726074033]
Large Language Models (LLMs) are increasingly used in post-development tasks such as code repair and testing.<n> Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric.<n>This paper presents the first large-scale empirical investigation into LLMs' ability to understand code.
arXiv Detail & Related papers (2025-04-06T05:59:29Z) - Substance Beats Style: Why Beginning Students Fail to Code with LLMs [3.4817709155395327]
Existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks.
This paper explores two competing hypotheses about the cause of student-LLM miscommunication.
arXiv Detail & Related papers (2024-10-15T20:36:30Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - Interactions with Prompt Problems: A New Way to Teach Programming with
Large Language Models [4.1599514827277355]
We propose a new way to teach programming with Prompt Problems.
Students receive a problem visually, indicating how input should be transformed to output, and must translate that to a prompt for an LLM to decipher.
The problem is considered correct when the code that is generated by the student prompt can pass all test cases.
arXiv Detail & Related papers (2024-01-19T15:32:46Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - CodeApex: A Bilingual Programming Evaluation Benchmark for Large
Language Models [43.655927559990616]
We propose CodeApex, a benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs.
We evaluate 12 widely used LLMs, including both general-purpose and specialized models.
GPT-4 exhibits the best programming capabilities, achieving approximate accuracy of 69%, 54%, and 66% on the three tasks, respectively.
arXiv Detail & Related papers (2023-09-05T04:12:01Z) - StudentEval: A Benchmark of Student-Written Prompts for Large Language
Models of Code [2.087827281461409]
StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming.
We analyze the prompts and find significant variation in students' prompting techniques.
arXiv Detail & Related papers (2023-06-07T16:03:55Z) - Learning to Plan with Natural Language [111.76828049344839]
Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks.
For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step.
We propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.
arXiv Detail & Related papers (2023-04-20T17:09:12Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z) - Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph.
Existing approaches serialize the output graph as a flat list of nodes and edges.
We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.