Related papers: Augmenting Diffs With Runtime Information

Augmenting Diffs With Runtime Information

URL: http://arxiv.org/abs/2212.11077v2
Date: Fri, 30 Jun 2023 12:27:41 GMT
Title: Augmenting Diffs With Runtime Information
Authors: Khashayar Etemadi, Aman Sharma, Fernanda Madeiral and Martin Monperrus
Abstract summary: Collector-Sahab is a tool that augments code diffs with runtime difference information. We run Collector-Sahab on 584 code diffs for Defects4J bugs and find it successfully augments the code diff for 95% (555/584) of them.
Score: 53.22981451758425
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Source code diffs are used on a daily basis as part of code review, inspection, and auditing. To facilitate understanding, they are typically accompanied by explanations that describe the essence of what is changed in the program. As manually crafting high-quality explanations is a cumbersome task, researchers have proposed automatic techniques to generate code diff explanations. Existing explanation generation methods solely focus on static analysis, i.e., they do not take advantage of runtime information to explain code changes. In this paper, we propose Collector-Sahab, a novel tool that augments code diffs with runtime difference information. Collector-Sahab compares the program states of the original (old) and patched (new) versions of a program to find unique variable values. Then, Collector-Sahab adds this novel runtime information to the source code diff as shown, for instance, in code reviewing systems. As an evaluation, we run Collector-Sahab on 584 code diffs for Defects4J bugs and find it successfully augments the code diff for 95% (555/584) of them. We also perform a user study and ask eight participants to score the augmented code diffs generated by Collector-Sahab. Per this user study, we conclude that developers find the idea of adding runtime data to code diffs promising and useful. Overall, our experiments show the effectiveness and usefulness of Collector-Sahab in augmenting code diffs with runtime difference information. Publicly-available repository: https://github.com/ASSERT-KTH/collector-sahab.

Related papers

ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding [60.37988508851391]
Language models (LMs) have become a staple of the code-writing toolbox. Research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse. In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency.
arXiv Detail & Related papers (2025-03-27T23:08:53Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Toward Interactive Optimization of Source Code Differences: An Empirical Study of Its Performance [1.313675711285772]
We propose an interactive approach to optimize source code differences (diffs) Users can provide feedback for the points of a diff that should not be matched but are or parts that should be matched but are not. The results of 23 GitHub projects confirm that 92% of nonoptimal diffs can be addressed with less than four feedback actions in the ideal case.
arXiv Detail & Related papers (2024-09-20T15:43:55Z)
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain. An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example. Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
Gitor: Scalable Code Clone Detection by Building Global Sample Graph [11.041017540277558]
We propose Gitor to capture the underlying connections among different code samples. Gitor has higher accuracy in terms of code clone detection and excellent execution time for inputs of various sizes.
arXiv Detail & Related papers (2023-11-15T08:48:50Z)
DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies [13.804337643709717]
DocChecker is a tool for detecting and correcting differences between code and its accompanying comments. It is adept at identifying inconsistencies between code and comments, and it can also generate synthetic comments. It achieves a new State-of-the-art result of 72.3% accuracy on the Inconsistency Code-Comment Detection task.
arXiv Detail & Related papers (2023-06-10T05:29:09Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
How is the speed of code review affected by activity, usage and code quality? [0.0]
This paper investigates how the speed of code review is affected by the code quality activity and usage in the context of extensions. The median time to merge is compared against several other variables which are collected using a variety of manual methods and APIs.
arXiv Detail & Related papers (2023-05-09T21:11:17Z)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process. It incorporates a similarity-based retriever and a pre-trained code language model. It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
Empirical Analysis on Effectiveness of NLP Methods for Predicting Code Smell [3.2973778921083357]
A code smell is a surface indicator of an inherent problem in the system. We use three Extreme learning machine kernels over 629 packages to identify eight code smells. Our findings indicate that the radial basis functional kernel performs best out of the three kernel methods with a mean accuracy of 98.52.
arXiv Detail & Related papers (2021-08-08T12:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.