Source Code Comprehension: A Contemporary Definition and Conceptual
Model for Empirical Investigation
- URL: http://arxiv.org/abs/2310.11301v1
- Date: Tue, 17 Oct 2023 14:23:46 GMT
- Title: Source Code Comprehension: A Contemporary Definition and Conceptual
Model for Empirical Investigation
- Authors: Marvin Wyrich
- Abstract summary: The research community has not managed to define source code comprehension as a concept.
An implicit definition by task prevails, i.e., code comprehension is what the experimental tasks measure.
This paper constitutes a reference work that defines source code comprehension and presents a conceptual framework.
- Score: 5.139874302398955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Be it in debugging, testing, code review or, more recently, pair programming
with AI assistance: in all these activities, software engineers need to
understand source code. Accordingly, plenty of research is taking place in the
field to find out, for example, what makes code easy to understand and which
tools can best support developers in their comprehension process. And while any
code comprehension researcher certainly has a rough idea of what they mean when
they mention a developer having a good understanding of a piece of code, to
date, the research community has not managed to define source code
comprehension as a concept. Instead, in primary research on code comprehension,
an implicit definition by task prevails, i.e., code comprehension is what the
experimental tasks measure. This approach has two negative consequences. First,
it makes it difficult to conduct secondary research. Currently, each code
comprehension primary study uses different comprehension tasks and measures,
and thus it is not clear whether different studies intend to measure the same
construct. Second, authors of a primary study run into the difficulty of
justifying their design decisions without a definition of what they attempt to
measure. An operationalization of an insufficiently described construct occurs,
which poses a threat to construct validity.
The task of defining code comprehension considering the theory of the past
fifty years is not an easy one. Nor is it a task that every author of a primary
study must accomplish on their own. Therefore, this paper constitutes a
reference work that defines source code comprehension and presents a conceptual
framework in which researchers can anchor their empirical code comprehension
research.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Towards Identifying Code Proficiency through the Analysis of Python Textbooks [7.381102801726683]
The aim is to gauge the level of proficiency a developer must have to understand a piece of source code.
Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies.
This paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks.
arXiv Detail & Related papers (2024-08-05T06:37:10Z) - Effective Large Language Model Debugging with Best-first Tree Search [27.68711322875045]
Large Language Models (LLMs) show promise in code generation tasks.
LLMs cannot consistently spot and fix bugs.
We propose an algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes.
arXiv Detail & Related papers (2024-07-26T19:26:00Z) - How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Investigating the Impact of Vocabulary Difficulty and Code Naturalness
on Program Comprehension [3.35803394416914]
This study aims to assess readability and understandability from the perspective of language acquisition.
We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of readability and understandability prediction methods.
arXiv Detail & Related papers (2023-08-25T15:15:00Z) - Comparing Code Explanations Created by Students and Large Language
Models [4.526618922750769]
Reasoning about code and explaining its purpose are fundamental skills for computer scientists.
The ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills.
Existing pedagogical approaches that scaffold the ability to explain code, such as producing code explanations on demand, do not currently scale well to large classrooms.
arXiv Detail & Related papers (2023-04-08T06:52:54Z) - Early Career Developers' Perceptions of Code Understandability. A Study of Complexity Metrics [7.5060856723794975]
A low understandability can increase the amount of coding effort, and misinterpreting code impacts the entire development process.
Our work investigates whether the McCabe Cyclomatic Complexity or the Cognitive Complexity can be a good predictor for the developers' perceived code understandability.
arXiv Detail & Related papers (2023-03-14T09:11:10Z) - Language Model Decoding as Likelihood-Utility Alignment [54.70547032876017]
We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility.
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
arXiv Detail & Related papers (2022-10-13T17:55:51Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic.
COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.