NRevisit: A Cognitive Behavioral Metric for Code Understandability Assessment
- URL: http://arxiv.org/abs/2504.18345v1
- Date: Fri, 25 Apr 2025 13:34:24 GMT
- Title: NRevisit: A Cognitive Behavioral Metric for Code Understandability Assessment
- Authors: Gao Hao, Haytham Hijazi, Júlio Medeiros, João Durães, Chan Tong Lam, Paulo de Carvalho, Henrique Madeira,
- Abstract summary: This paper proposes a dynamic code understandability assessment method.<n>It estimates a personalized code understandability score from the perspective of the specific programmer handling the code.<n>It can be easily implemented using a simple, low-cost, and non-intrusive desktop eye tracker or even a standard computer camera.
- Score: 1.513554688029813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Measuring code understandability is both highly relevant and exceptionally challenging. This paper proposes a dynamic code understandability assessment method, which estimates a personalized code understandability score from the perspective of the specific programmer handling the code. The method consists of dynamically dividing the code unit under development or review in code regions (invisible to the programmer) and using the number of revisits (NRevisit) to each region as the primary feature for estimating the code understandability score. This approach removes the uncertainty related to the concept of a "typical programmer" assumed by static software code complexity metrics and can be easily implemented using a simple, low-cost, and non-intrusive desktop eye tracker or even a standard computer camera. This metric was evaluated using cognitive load measured through electroencephalography (EEG) in a controlled experiment with 35 programmers. Results show a very high correlation ranging from rs = 0.9067 to rs = 0.9860 (with p nearly 0) between the scores obtained with different alternatives of NRevisit and the ground truth represented by the EEG measurements of programmers' cognitive load, demonstrating the effectiveness of our approach in reflecting the cognitive effort required for code comprehension. The paper also discusses possible practical applications of NRevisit, including its use in the context of AI-generated code, which is already widely used today.
Related papers
- An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding [50.17907898478795]
This work proposes a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in real-world reverse engineering scenarios.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2025-04-30T17:02:06Z) - The Code Barrier: What LLMs Actually Understand? [7.407441962359689]
This research uses code obfuscation as a structured testing framework to evaluate semantic understanding capabilities of language models.<n>Findings show a statistically significant performance decline as obfuscation complexity increases.<n>This research introduces a new evaluation approach for assessing code comprehension in language models.
arXiv Detail & Related papers (2025-04-14T14:11:26Z) - On Explaining (Large) Language Models For Code Using Global Code-Based Explanations [45.126233498200534]
Language Models for Code (LLM4Code) have significantly changed the landscape of software engineering (SE)<n>We introduce code rationales (Code$Q$), a technique with rigorous mathematical underpinning, to identify subsets of tokens that can explain individual code predictions.<n>Our evaluation demonstrates that Code$Q$ is a powerful interpretability method to explain how (less) meaningful input concepts (i.e., natural language particle at') highly impact output generation.
arXiv Detail & Related papers (2025-03-21T01:00:45Z) - Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score.<n>It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models.<n>Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z) - A Computational Method for Measuring "Open Codes" in Qualitative Analysis [47.358809793796624]
Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets.
We present a computational method to measure and identify potential biases from "open codes" systematically.
arXiv Detail & Related papers (2024-11-19T00:44:56Z) - How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - Automating the Correctness Assessment of AI-generated Code for Security Contexts [8.009107843106108]
We propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes.
We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code.
Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation.
arXiv Detail & Related papers (2023-10-28T22:28:32Z) - Investigating the Impact of Vocabulary Difficulty and Code Naturalness
on Program Comprehension [3.35803394416914]
This study aims to assess readability and understandability from the perspective of language acquisition.
We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of readability and understandability prediction methods.
arXiv Detail & Related papers (2023-08-25T15:15:00Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - Great Truths are Always Simple: A Rather Simple Knowledge Encoder for
Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models [89.98762327725112]
Commonsense reasoning in natural language is a desired ability of artificial intelligent systems.
For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models(PTMs) with a knowledge-aware graph neural network(GNN) encoder.
Despite the effectiveness, these approaches are built on heavy architectures, and can't clearly explain how external knowledge resources improve the reasoning capacity of PTMs.
arXiv Detail & Related papers (2022-05-04T01:27:36Z) - Improving Compositionality of Neural Networks by Decoding
Representations to Inputs [83.97012077202882]
We bridge the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs.
We demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness.
arXiv Detail & Related papers (2021-06-01T20:07:16Z) - CodeBLEU: a Method for Automatic Evaluation of Code Synthesis [57.87741831987889]
In the area of code synthesis, the commonly used evaluation metric is BLEU or perfect accuracy.
We introduce a new automatic evaluation metric, dubbed CodeBLEU.
It absorbs the strength of BLEU in the n-gram match and further injects code syntax via abstract syntax trees (AST) and code semantics via data-flow.
arXiv Detail & Related papers (2020-09-22T03:10:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.