Identifying Self-Admitted Technical Debt in Issue Tracking Systems using
Machine Learning
- URL: http://arxiv.org/abs/2202.02180v1
- Date: Fri, 4 Feb 2022 15:15:13 GMT
- Title: Identifying Self-Admitted Technical Debt in Issue Tracking Systems using
Machine Learning
- Authors: Yikun Li, Mohamed Soliman, Paris Avgeriou
- Abstract summary: Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits.
Most work on identifying Self-Admitted Technical Debt focuses on source code comments.
We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
- Score: 3.446864074238136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Technical debt is a metaphor indicating sub-optimal solutions implemented for
short-term benefits by sacrificing the long-term maintainability and
evolvability of software. A special type of technical debt is explicitly
admitted by software engineers (e.g. using a TODO comment); this is called
Self-Admitted Technical Debt or SATD. Most work on automatically identifying
SATD focuses on source code comments. In addition to source code comments,
issue tracking systems have shown to be another rich source of SATD, but there
are no approaches specifically for automatically identifying SATD in issues. In
this paper, we first create a training dataset by collecting and manually
analyzing 4,200 issues (that break down to 23,180 sections of issues) from
seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase,
Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and
Google Monorail). We then propose and optimize an approach for automatically
identifying SATD in issue tracking systems using machine learning. Our findings
indicate that: 1) our approach outperforms baseline approaches by a wide margin
with regard to the F1-score; 2) transferring knowledge from suitable datasets
can improve the predictive performance of our approach; 3) extracted SATD
keywords are intuitive and potentially indicating types and indicators of SATD;
4) projects using different issue tracking systems have less common SATD
keywords compared to projects using the same issue tracking system; 5) a small
amount of training data is needed to achieve good accuracy.
Related papers
- Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt [6.004718679054704]
Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal.
We build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD.
We introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts.
arXiv Detail & Related papers (2024-10-21T09:22:16Z) - A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems [13.90991624629898]
This paper empirically analyzes the presence of Self-Admitted Technical Debt (SATD) in Deep Learning systems.
We derived a taxonomy of DL-specific SATD through open coding, featuring seven categories and 41 leaves.
arXiv Detail & Related papers (2024-09-18T09:21:10Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - SATDAUG -- A Balanced and Augmented Dataset for Detecting Self-Admitted
Technical Debt [6.699060157800401]
Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts.
We share the textitSATDAUG dataset, an augmented version of existing SATD datasets, including source code comments, issue tracker, pull requests, and commit messages.
arXiv Detail & Related papers (2024-03-12T14:33:53Z) - Towards Automatically Addressing Self-Admitted Technical Debt: How Far
Are We? [17.128428286986573]
This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models.
We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects.
We use this dataset to experiment with seven different generative deep learning (DL) model configurations.
arXiv Detail & Related papers (2023-08-17T12:27:32Z) - Machine Learning Methods in Solving the Boolean Satisfiability Problem [72.21206588430645]
The paper reviews the recent literature on solving the Boolean satisfiability problem (SAT) with machine learning techniques.
We examine the evolving ML-SAT solvers from naive classifiers with handcrafted features to the emerging end-to-end SAT solvers such as NeuroSAT.
arXiv Detail & Related papers (2022-03-02T05:14:12Z) - Automatic Identification of Self-Admitted Technical Debt from Four
Different Sources [3.446864074238136]
Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems.
Previous work has focused on identifying SATD from source code comments and issue trackers.
We propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems.
arXiv Detail & Related papers (2022-02-04T20:59:25Z) - Benchmarking high-fidelity pedestrian tracking systems for research,
real-time monitoring and crowd control [55.41644538483948]
High-fidelity pedestrian tracking in real-life conditions has been an important tool in fundamental crowd dynamics research.
As this technology advances, it is becoming increasingly useful also in society.
To successfully employ pedestrian tracking techniques in research and technology, it is crucial to validate and benchmark them for accuracy.
We present and discuss a benchmark suite, towards an open standard in the community, for privacy-respectful pedestrian tracking techniques.
arXiv Detail & Related papers (2021-08-26T11:45:26Z) - Transformer-based Machine Learning for Fast SAT Solvers and Logic
Synthesis [63.53283025435107]
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems.
In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem.
arXiv Detail & Related papers (2021-07-15T04:47:35Z) - Conditioned Text Generation with Transfer for Closed-Domain Dialogue
Systems [65.48663492703557]
We show how to optimally train and control the generation of intent-specific sentences using a conditional variational autoencoder.
We introduce a new protocol called query transfer that allows to leverage a large unlabelled dataset.
arXiv Detail & Related papers (2020-11-03T14:06:10Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.