Quantifying and Characterizing Clones of Self-Admitted Technical Debt in
Build Systems
- URL: http://arxiv.org/abs/2402.08920v1
- Date: Wed, 14 Feb 2024 03:34:02 GMT
- Title: Quantifying and Characterizing Clones of Self-Admitted Technical Debt in
Build Systems
- Authors: Tao Xiao, Zhili Zeng, Dong Wang, Hideaki Hata, Shane McIntosh, Kenichi
Matsumoto
- Abstract summary: Self-Admitted Technical Debt (SATD) annotates development decisions that intentionally exchange long-term software artifact quality for short-term goals.
Recent work explores the existence of SATD clones (duplicate or near duplicate SATD comments) in source code.
We conduct a large-scale study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant build systems.
- Score: 10.81072153747528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-Admitted Technical Debt (SATD) annotates development decisions that
intentionally exchange long-term software artifact quality for short-term
goals. Recent work explores the existence of SATD clones (duplicate or near
duplicate SATD comments) in source code. Cloning of SATD in build systems
(e.g., CMake and Maven) may propagate suboptimal design choices, threatening
qualities of the build system that stakeholders rely upon (e.g.,
maintainability, reliability, repeatability). Hence, we conduct a large-scale
study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant
build systems to investigate the prevalence of SATD clones and to characterize
their incidences. We observe that: (i) prior work suggests that 41-65% of SATD
comments in source code are clones, but in our studied build system context,
the rates range from 62% to 95%, suggesting that SATD clones are a more
prevalent phenomenon in build systems than in source code; (ii) statements
surrounding SATD clones are highly similar, with 76% of occurrences having
similarity scores greater than 0.8; (iii) a quarter of SATD clones are
introduced by the author of the original SATD statements; and (iv) among the
most commonly cloned SATD comments, external factors (e.g., platform and tool
configuration) are the most frequent locations, limitations in tools and
libraries are the most frequent causes, and developers often copy SATD comments
that describe issues to be fixed later. Our work presents the first step toward
systematically understanding SATD clones in build systems and opens up avenues
for future work, such as distinguishing different SATD clone behavior, as well
as designing an automated recommendation system for repaying SATD effectively
based on resolved clones.
Related papers
- NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents [79.29376673236142]
Existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems.<n>We present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents.
arXiv Detail & Related papers (2025-12-14T15:12:13Z) - Hidden in Plain Sight: Where Developers Confess Self-Admitted Technical Debt [3.0178994719454564]
Self-Admitted Technical Debt (SATD) is crucial for proactive software maintenance.<n>Previous research has primarily targeted detecting and prioritizing SATD, with little focus on the source code afflicted with SATD.<n>We leverage the extensive SATD dataset PENTACET, containing code comments from over 9000 Java Open Source Software (OSS) repositories.<n>We quantitatively infer where SATD most commonly occurs and which code constructs/statements it most frequently affects.
arXiv Detail & Related papers (2025-11-03T12:47:19Z) - A First Look at the Self-Admitted Technical Debt in Test Code: Taxonomy and Detection [7.475625941772781]
Self-admitted technical debt (SATD) refers to comments in which developers explicitly acknowledge code issues, workarounds, or suboptimal solutions.<n>This study investigates SATD in test code by manually analyzing 50,000 comments randomly sampled from 1.6 million comments across 1,000 open-source Java projects.
arXiv Detail & Related papers (2025-10-25T19:09:18Z) - Understanding Self-Admitted Technical Debt in Test Code: An Empirical Study [2.1295493440485513]
Developers explicitly document technical debt in code comments, referred to as Self-Admitted Technical Debt (SATD)<n>This study aims to disclose the nature of SATD in the test code by examining its distribution and types.<n>Our study also presents comprehensive categories of SATD types in the test code, and machine learning models are developed to automatically classify SATD comments.
arXiv Detail & Related papers (2025-10-25T11:00:48Z) - Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt [6.004718679054704]
Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal.
We build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD.
We introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts.
arXiv Detail & Related papers (2024-10-21T09:22:16Z) - An Exploratory Study of the Relationship between SATD and Other Software Development Activities [13.026170714454071]
Self-Admitted Technical Debt (SATD) is a specific type of Technical Debt that involves documenting code to remind developers of its debt.
Previous research has explored various aspects of SATD, including methods, distribution, and its impact on software quality.
This study investigates the relationship between removing and adding SATD and activities such as bug fixing, adding new features, and testing.
arXiv Detail & Related papers (2024-04-02T13:45:42Z) - AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual
Adaptation for Code Clone Detection [69.79627042058048]
AdaCCD is a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language.
We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages.
arXiv Detail & Related papers (2023-11-13T12:20:48Z) - Who Made This Copy? An Empirical Analysis of Code Clone Authorship [1.1512593234650217]
We analyzed the authorship of code clones at the line-level granularity for Java files in 153 Apache projects stored on GitHub.
We found that there are a substantial number of clone lines across all projects.
One-third of clone sets are primarily contributed to by multiple leading authors.
arXiv Detail & Related papers (2023-09-03T08:24:32Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Estimating the hardness of SAT encodings for Logical Equivalence
Checking of Boolean circuits [58.83758257568434]
We show that the hardness of SAT encodings for LEC instances can be estimated textitw.r.t some SAT partitioning.
The paper proposes several methods for constructing partitionings, which, when used in practice, allow one to estimate the hardness of SAT encodings for LEC with good accuracy.
arXiv Detail & Related papers (2022-10-04T09:19:13Z) - Automatic Identification of Self-Admitted Technical Debt from Four
Different Sources [3.446864074238136]
Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems.
Previous work has focused on identifying SATD from source code comments and issue trackers.
We propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems.
arXiv Detail & Related papers (2022-02-04T20:59:25Z) - Identifying Self-Admitted Technical Debt in Issue Tracking Systems using
Machine Learning [3.446864074238136]
Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits.
Most work on identifying Self-Admitted Technical Debt focuses on source code comments.
We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
arXiv Detail & Related papers (2022-02-04T15:15:13Z) - Transformer-based Machine Learning for Fast SAT Solvers and Logic
Synthesis [63.53283025435107]
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems.
In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem.
arXiv Detail & Related papers (2021-07-15T04:47:35Z) - Semantic Clone Detection via Probabilistic Software Modeling [69.43451204725324]
This article contributes a semantic clone detection approach that detects clones that have 0% syntactic similarity.
We present SCD-PSM as a stable and precise solution to semantic clone detection.
arXiv Detail & Related papers (2020-08-11T17:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.