Quantifying and Characterizing Clones of Self-Admitted Technical Debt in
Build Systems
- URL: http://arxiv.org/abs/2402.08920v1
- Date: Wed, 14 Feb 2024 03:34:02 GMT
- Title: Quantifying and Characterizing Clones of Self-Admitted Technical Debt in
Build Systems
- Authors: Tao Xiao, Zhili Zeng, Dong Wang, Hideaki Hata, Shane McIntosh, Kenichi
Matsumoto
- Abstract summary: Self-Admitted Technical Debt (SATD) annotates development decisions that intentionally exchange long-term software artifact quality for short-term goals.
Recent work explores the existence of SATD clones (duplicate or near duplicate SATD comments) in source code.
We conduct a large-scale study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant build systems.
- Score: 10.81072153747528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-Admitted Technical Debt (SATD) annotates development decisions that
intentionally exchange long-term software artifact quality for short-term
goals. Recent work explores the existence of SATD clones (duplicate or near
duplicate SATD comments) in source code. Cloning of SATD in build systems
(e.g., CMake and Maven) may propagate suboptimal design choices, threatening
qualities of the build system that stakeholders rely upon (e.g.,
maintainability, reliability, repeatability). Hence, we conduct a large-scale
study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant
build systems to investigate the prevalence of SATD clones and to characterize
their incidences. We observe that: (i) prior work suggests that 41-65% of SATD
comments in source code are clones, but in our studied build system context,
the rates range from 62% to 95%, suggesting that SATD clones are a more
prevalent phenomenon in build systems than in source code; (ii) statements
surrounding SATD clones are highly similar, with 76% of occurrences having
similarity scores greater than 0.8; (iii) a quarter of SATD clones are
introduced by the author of the original SATD statements; and (iv) among the
most commonly cloned SATD comments, external factors (e.g., platform and tool
configuration) are the most frequent locations, limitations in tools and
libraries are the most frequent causes, and developers often copy SATD comments
that describe issues to be fixed later. Our work presents the first step toward
systematically understanding SATD clones in build systems and opens up avenues
for future work, such as distinguishing different SATD clone behavior, as well
as designing an automated recommendation system for repaying SATD effectively
based on resolved clones.
Related papers
- Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt [6.004718679054704]
Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal.
We build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD.
We introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts.
arXiv Detail & Related papers (2024-10-21T09:22:16Z) - An Exploratory Study of the Relationship between SATD and Other Software Development Activities [13.026170714454071]
Self-Admitted Technical Debt (SATD) is a specific type of Technical Debt that involves documenting code to remind developers of its debt.
Previous research has explored various aspects of SATD, including methods, distribution, and its impact on software quality.
This study investigates the relationship between removing and adding SATD and activities such as bug fixing, adding new features, and testing.
arXiv Detail & Related papers (2024-04-02T13:45:42Z) - AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual
Adaptation for Code Clone Detection [69.79627042058048]
AdaCCD is a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language.
We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages.
arXiv Detail & Related papers (2023-11-13T12:20:48Z) - Who Made This Copy? An Empirical Analysis of Code Clone Authorship [1.1512593234650217]
We analyzed the authorship of code clones at the line-level granularity for Java files in 153 Apache projects stored on GitHub.
We found that there are a substantial number of clone lines across all projects.
One-third of clone sets are primarily contributed to by multiple leading authors.
arXiv Detail & Related papers (2023-09-03T08:24:32Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Estimating the hardness of SAT encodings for Logical Equivalence
Checking of Boolean circuits [58.83758257568434]
We show that the hardness of SAT encodings for LEC instances can be estimated textitw.r.t some SAT partitioning.
The paper proposes several methods for constructing partitionings, which, when used in practice, allow one to estimate the hardness of SAT encodings for LEC with good accuracy.
arXiv Detail & Related papers (2022-10-04T09:19:13Z) - Automatic Identification of Self-Admitted Technical Debt from Four
Different Sources [3.446864074238136]
Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems.
Previous work has focused on identifying SATD from source code comments and issue trackers.
We propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems.
arXiv Detail & Related papers (2022-02-04T20:59:25Z) - Identifying Self-Admitted Technical Debt in Issue Tracking Systems using
Machine Learning [3.446864074238136]
Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits.
Most work on identifying Self-Admitted Technical Debt focuses on source code comments.
We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
arXiv Detail & Related papers (2022-02-04T15:15:13Z) - Transformer-based Machine Learning for Fast SAT Solvers and Logic
Synthesis [63.53283025435107]
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems.
In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem.
arXiv Detail & Related papers (2021-07-15T04:47:35Z) - Semantic Clone Detection via Probabilistic Software Modeling [69.43451204725324]
This article contributes a semantic clone detection approach that detects clones that have 0% syntactic similarity.
We present SCD-PSM as a stable and precise solution to semantic clone detection.
arXiv Detail & Related papers (2020-08-11T17:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.