Related papers: Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

URL: http://arxiv.org/abs/2202.02180v1
Date: Fri, 4 Feb 2022 15:15:13 GMT
Title: Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning
Authors: Yikun Li, Mohamed Soliman, Paris Avgeriou
Abstract summary: Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits. Most work on identifying Self-Admitted Technical Debt focuses on source code comments. We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
Score: 3.446864074238136
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Technical debt is a metaphor indicating sub-optimal solutions implemented for short-term benefits by sacrificing the long-term maintainability and evolvability of software. A special type of technical debt is explicitly admitted by software engineers (e.g. using a TODO comment); this is called Self-Admitted Technical Debt or SATD. Most work on automatically identifying SATD focuses on source code comments. In addition to source code comments, issue tracking systems have shown to be another rich source of SATD, but there are no approaches specifically for automatically identifying SATD in issues. In this paper, we first create a training dataset by collecting and manually analyzing 4,200 issues (that break down to 23,180 sections of issues) from seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase, Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and Google Monorail). We then propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning. Our findings indicate that: 1) our approach outperforms baseline approaches by a wide margin with regard to the F1-score; 2) transferring knowledge from suitable datasets can improve the predictive performance of our approach; 3) extracted SATD keywords are intuitive and potentially indicating types and indicators of SATD; 4) projects using different issue tracking systems have less common SATD keywords compared to projects using the same issue tracking system; 5) a small amount of training data is needed to achieve good accuracy.

Related papers

Aligning Multimodal LLM with Human Preference: A Survey [62.89722942008262]
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed.
arXiv Detail & Related papers (2025-03-18T17:59:56Z)
Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt [6.004718679054704]
Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal. We build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD. We introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts.
arXiv Detail & Related papers (2024-10-21T09:22:16Z)
A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems [13.90991624629898]
This paper empirically analyzes the presence of Self-Admitted Technical Debt (SATD) in Deep Learning systems. We derived a taxonomy of DL-specific SATD through open coding, featuring seven categories and 41 leaves.
arXiv Detail & Related papers (2024-09-18T09:21:10Z)
AutoBencher: Towards Declarative Benchmark Construction [74.54640925146289]
We use AutoBencher to create datasets for math, multilinguality, knowledge, and safety. The scalability of AutoBencher allows it to test fine-grained categories knowledge, creating datasets that elicit 22% more model errors (i.e., difficulty) than existing benchmarks.
arXiv Detail & Related papers (2024-07-11T10:03:47Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
SATDAUG -- A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt [6.699060157800401]
Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts. We share the textitSATDAUG dataset, an augmented version of existing SATD datasets, including source code comments, issue tracker, pull requests, and commit messages.
arXiv Detail & Related papers (2024-03-12T14:33:53Z)
Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We? [17.128428286986573]
This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. We use this dataset to experiment with seven different generative deep learning (DL) model configurations.
arXiv Detail & Related papers (2023-08-17T12:27:32Z)
Machine Learning Methods in Solving the Boolean Satisfiability Problem [72.21206588430645]
The paper reviews the recent literature on solving the Boolean satisfiability problem (SAT) with machine learning techniques. We examine the evolving ML-SAT solvers from naive classifiers with handcrafted features to the emerging end-to-end SAT solvers such as NeuroSAT.
arXiv Detail & Related papers (2022-03-02T05:14:12Z)
Automatic Identification of Self-Admitted Technical Debt from Four Different Sources [3.446864074238136]
Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems. Previous work has focused on identifying SATD from source code comments and issue trackers. We propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems.
arXiv Detail & Related papers (2022-02-04T20:59:25Z)
Benchmarking high-fidelity pedestrian tracking systems for research, real-time monitoring and crowd control [55.41644538483948]
High-fidelity pedestrian tracking in real-life conditions has been an important tool in fundamental crowd dynamics research. As this technology advances, it is becoming increasingly useful also in society. To successfully employ pedestrian tracking techniques in research and technology, it is crucial to validate and benchmark them for accuracy. We present and discuss a benchmark suite, towards an open standard in the community, for privacy-respectful pedestrian tracking techniques.
arXiv Detail & Related papers (2021-08-26T11:45:26Z)
Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis [63.53283025435107]
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem.
arXiv Detail & Related papers (2021-07-15T04:47:35Z)
Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems [65.48663492703557]
We show how to optimally train and control the generation of intent-specific sentences using a conditional variational autoencoder. We introduce a new protocol called query transfer that allows to leverage a large unlabelled dataset.
arXiv Detail & Related papers (2020-11-03T14:06:10Z)
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision. We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.