Related papers: An Exploratory Eye Tracking Study on How Developers Classify and Debug Python Code in Different Paradigms

An Exploratory Eye Tracking Study on How Developers Classify and Debug Python Code in Different Paradigms

URL: http://arxiv.org/abs/2511.07612v1
Date: Wed, 12 Nov 2025 01:07:14 GMT
Title: An Exploratory Eye Tracking Study on How Developers Classify and Debug Python Code in Different Paradigms
Authors: Samuel W. Flint, Jigyasa Chauhan, Niloofar Mansoor, Bonita Sharif, Robert Dyer,
Abstract summary: This study investigates how developers classify the predominant paradigm in Python code.<n>We find confusion in labeling Functional and Procedural paradigms, but not Object-Oriented.<n>Changing the predominant paradigm did not affect the ability to debug the code, though developers did rate themselves with lower confidence for Functional code.
Score: 3.7780249574083804
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Modern programming languages, such as Python, support language features from several paradigms, such as object-oriented, procedural, and functional. Research has shown that code written in some paradigms can be harder to comprehend, but to date, no research has looked at which paradigm-specific language features impact comprehension. To this end, this study seeks to uncover which paradigm-specific features impactcomprehension and debugging of code or how multi-paradigm code might affect a developer's ability to do so. We present an exploratory empirical eye-tracking study to investigate 1) how developers classify the predominant paradigm in Python code and 2) how the paradigm affects their ability to debug Python code. The goal is to uncover if specific language features are looked at more often while classifying and debugging code with a predominant paradigm. Twenty-nine developers (primarily students) were recruited for the study and were each given four classification and four debugging tasks in Python. Eye movements were recorded during all the tasks. The results indicate confusion in labeling Functional and Procedural paradigms, but not Object-Oriented. The code with predominantly functional paradigms also took the longest to complete. Changing the predominant paradigm did not affect the ability to debug the code, though developers did rate themselves with lower confidence for Functional code. We report significant differences in reading patterns during debugging, especially in the Functional code. During classification, results show that developers do not necessarily read paradigm-relevant token types.

Related papers

Modeling Student Learning with 3.8 Million Program Traces [52.153493498021895]
We introduce a dataset of over 3.8 million programming reasoning traces from users of Pencil Code.<n>We find that models trained on real traces are stronger at modeling diverse student behavior.<n>We show that we can help students recover from mistakes by steering code generation models to identify a sequence of edits that will result in more correct code.
arXiv Detail & Related papers (2025-10-06T17:37:17Z)
Bugs in the Shadows: Static Detection of Faulty Python Refactorings [44.115219601924856]
Python's dynamic type system poses significant challenges for automated code transformations.<n>Our analysis uncovered 29 bugs across four types from a total of 1,152 attempts.<n>These results highlight the need to improve the robustness of current Python tools to ensure the correctness of automated code transformations.
arXiv Detail & Related papers (2025-07-01T18:03:56Z)
BugSpotter: Automated Generation of Code Debugging Exercises [22.204802715829615]
This paper introduces BugSpotter, a tool to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification.
arXiv Detail & Related papers (2024-11-21T16:56:33Z)
How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries [3.0105723746073]
We investigate the risk level of proficient code inside a file. We identify several instances of high proficient code that was also high risk. We envision that the study should help developers identify scenarios where proficient code might be detrimental to future code maintenance activities.
arXiv Detail & Related papers (2024-10-08T04:45:11Z)
Towards Identifying Code Proficiency through the Analysis of Python Textbooks [7.381102801726683]
The aim is to gauge the level of proficiency a developer must have to understand a piece of source code. Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies. This paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks.
arXiv Detail & Related papers (2024-08-05T06:37:10Z)
Language Agnostic Code Embeddings [61.84835551549612]
We focus on the cross-lingual capabilities of code embeddings across different programming languages. Code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details. We show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks.
arXiv Detail & Related papers (2023-10-25T17:34:52Z)
Large Language Models of Code Fail at Completing Code with Potential Bugs [30.80172644795715]
We study the buggy-code completion problem inspired by real-time code suggestion. We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs.
arXiv Detail & Related papers (2023-06-06T06:35:27Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution. We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.