Related papers: Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation

Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation

URL: http://arxiv.org/abs/2310.11301v1
Date: Tue, 17 Oct 2023 14:23:46 GMT
Title: Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation
Authors: Marvin Wyrich
Abstract summary: The research community has not managed to define source code comprehension as a concept. An implicit definition by task prevails, i.e., code comprehension is what the experimental tasks measure. This paper constitutes a reference work that defines source code comprehension and presents a conceptual framework.
Score: 5.139874302398955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Be it in debugging, testing, code review or, more recently, pair programming with AI assistance: in all these activities, software engineers need to understand source code. Accordingly, plenty of research is taking place in the field to find out, for example, what makes code easy to understand and which tools can best support developers in their comprehension process. And while any code comprehension researcher certainly has a rough idea of what they mean when they mention a developer having a good understanding of a piece of code, to date, the research community has not managed to define source code comprehension as a concept. Instead, in primary research on code comprehension, an implicit definition by task prevails, i.e., code comprehension is what the experimental tasks measure. This approach has two negative consequences. First, it makes it difficult to conduct secondary research. Currently, each code comprehension primary study uses different comprehension tasks and measures, and thus it is not clear whether different studies intend to measure the same construct. Second, authors of a primary study run into the difficulty of justifying their design decisions without a definition of what they attempt to measure. An operationalization of an insufficiently described construct occurs, which poses a threat to construct validity. The task of defining code comprehension considering the theory of the past fifty years is not an easy one. Nor is it a task that every author of a primary study must accomplish on their own. Therefore, this paper constitutes a reference work that defines source code comprehension and presents a conceptual framework in which researchers can anchor their empirical code comprehension research.

Related papers

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding [50.17907898478795]
This work proposes a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in real-world reverse engineering scenarios. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2025-04-30T17:02:06Z)
The Code Barrier: What LLMs Actually Understand? [7.407441962359689]
This research uses code obfuscation as a structured testing framework to evaluate semantic understanding capabilities of language models. Findings show a statistically significant performance decline as obfuscation complexity increases. This research introduces a new evaluation approach for assessing code comprehension in language models.
arXiv Detail & Related papers (2025-04-14T14:11:26Z)
Counting the Trees in the Forest: Evaluating Prompt Segmentation for Classifying Code Comprehension Level [2.250363093539224]
This paper introduces a novel method for automatically assessing the comprehension level of responses to Explain in Plain English'' questions. Using a Large Language Model (LLM) to segment both the student's description and the code, we aim to determine whether the student describes each line individually (many segments) or the code as a whole.
arXiv Detail & Related papers (2025-03-15T17:57:38Z)
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
In large language models (LLMs), code and reasoning reinforce each other. Code provides verifiable execution paths, enforces logical decomposition, and enables runtime validation. We identify key challenges and propose future research directions to strengthen this synergy.
arXiv Detail & Related papers (2025-02-26T18:55:42Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Towards Identifying Code Proficiency through the Analysis of Python Textbooks [7.381102801726683]
The aim is to gauge the level of proficiency a developer must have to understand a piece of source code. Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies. This paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks.
arXiv Detail & Related papers (2024-08-05T06:37:10Z)
Effective Large Language Model Debugging with Best-first Tree Search [27.68711322875045]
Large Language Models (LLMs) show promise in code generation tasks. LLMs cannot consistently spot and fix bugs. We propose an algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes.
arXiv Detail & Related papers (2024-07-26T19:26:00Z)
How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)
Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension [3.35803394416914]
This study aims to assess readability and understandability from the perspective of language acquisition. We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of readability and understandability prediction methods.
arXiv Detail & Related papers (2023-08-25T15:15:00Z)
Comparing Code Explanations Created by Students and Large Language Models [4.526618922750769]
Reasoning about code and explaining its purpose are fundamental skills for computer scientists. The ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. Existing pedagogical approaches that scaffold the ability to explain code, such as producing code explanations on demand, do not currently scale well to large classrooms.
arXiv Detail & Related papers (2023-04-08T06:52:54Z)
Early Career Developers' Perceptions of Code Understandability. A Study of Complexity Metrics [7.5060856723794975]
A low understandability can increase the amount of coding effort, and misinterpreting code impacts the entire development process. Our work investigates whether the McCabe Cyclomatic Complexity or the Cognitive Complexity can be a good predictor for the developers' perceived code understandability.
arXiv Detail & Related papers (2023-03-14T09:11:10Z)
Language Model Decoding as Likelihood-Utility Alignment [54.70547032876017]
We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility. Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
arXiv Detail & Related papers (2022-10-13T17:55:51Z)
Pre-Training Representations of Binary Code Using Contrastive Learning [13.570375923483452]
ContraBin is a contrastive learning technique that integrates source code and comment information along with binaries. We analyze the impact of human-written and synthetic comments on binary code comprehension tasks.
arXiv Detail & Related papers (2022-10-11T02:39:06Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic. COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.