Related papers: The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding

The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding

URL: http://arxiv.org/abs/2012.09590v2
Date: Wed, 10 Feb 2021 12:52:32 GMT
Title: The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding
Authors: Marvin Wyrich, Andreas Preikschat, Daniel Graziotin, Stefan Wagner
Abstract summary: We investigate whether a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility. We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating.
Score: 10.644832702859484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Static code analysis tools and integrated development environments present developers with quality-related software metrics, some of which describe the understandability of source code. Software metrics influence overarching strategic decisions that impact the future of companies and the prioritization of everyday software development tasks. Several software metrics, however, lack in validation: we just choose to trust that they reflect what they are supposed to measure. Some of them were even shown to not measure the quality aspects they intend to measure. Yet, they influence us through biases in our cognitive-driven actions. In particular, they might anchor us in our decisions. Whether the anchoring effect exists with software metrics has not been studied yet. We conducted a randomized and double-blind experiment to investigate the extent to which a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility, whether performance is affected by the anchoring effect when working on comprehension tasks, and which individual characteristics might play a role in the anchoring effect. We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating. The effect does not seem to affect the time or correctness when working on comprehension questions related to the code snippets under study. Since the anchoring effect is one of the most robust cognitive biases, and we have limited understanding of the consequences of the demonstrated manipulation of developers by non-validated metrics, we call for an increased awareness of the responsibility in code quality reporting and for corresponding tools to be based on scientific evidence.

Related papers

Compute Optimal Scaling of Skills: Knowledge vs Reasoning [50.76705503978189]
We ask whether compute-optimal scaling behaviour can be skill-dependent. In particular, we examine knowledge and reasoning-based skills such as knowledge-based QA and code generation. We conclude with an analysis of how our findings relate to standard compute-optimal scaling using a validation set.
arXiv Detail & Related papers (2025-03-13T05:21:22Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes. We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
Towards Understanding the Impact of Code Modifications on Software Quality Metrics [1.2277343096128712]
This study aims to assess and interpret the impact of code modifications on software quality metrics. The underlying hypothesis posits that code modifications inducing similar changes in software quality metrics can be grouped into distinct clusters. The results reveal distinct clusters of code modifications, each accompanied by a concise description, revealing their collective impact on software quality metrics.
arXiv Detail & Related papers (2024-04-05T08:41:18Z)
Free Open Source Communities Sustainability: Does It Make a Difference in Software Quality? [2.981092370528753]
This study aims to empirically explore how the different aspects of sustainability impact software quality. 16 sustainability metrics across four categories were sampled and applied to a set of 217 OSS projects.
arXiv Detail & Related papers (2024-02-10T09:37:44Z)
Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension [3.35803394416914]
This study aims to assess readability and understandability from the perspective of language acquisition. We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of readability and understandability prediction methods.
arXiv Detail & Related papers (2023-08-25T15:15:00Z)
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools. Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z)
Breaks and Code Quality: Investigating the Impact of Forgetting on Software Development. A Registered Report [15.438443553618896]
It is crucial to ensure that developers have a clear understanding of the and can work efficiently and effectively even after long interruptions. This registered report proposes an empirical study aimed at investigating the impact of the developer's activity breaks duration and different code quality properties.
arXiv Detail & Related papers (2023-05-01T10:33:17Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI [49.64037266892634]
We describe an open source Python toolkit named Uncertainty Quantification 360 (UQ360) for the uncertainty quantification of AI models. The goal of this toolkit is twofold: first, to provide a broad range of capabilities to streamline as well as foster the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle; second, to encourage further exploration of UQ's connections to other pillars of trustworthy AI.
arXiv Detail & Related papers (2021-06-02T18:29:04Z)
Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions. influence estimates are fairly accurate for shallow networks. Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.