Related papers: Automatic Identification of Self-Admitted Technical Debt from Four Different Sources

Automatic Identification of Self-Admitted Technical Debt from Four Different Sources

URL: http://arxiv.org/abs/2202.02387v5
Date: Fri, 21 Apr 2023 11:39:33 GMT
Title: Automatic Identification of Self-Admitted Technical Debt from Four Different Sources
Authors: Yikun Li, Mohamed Soliman, Paris Avgeriou
Abstract summary: Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems. Previous work has focused on identifying SATD from source code comments and issue trackers. We propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems.
Score: 3.446864074238136
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems. A large part of technical debt is explicitly reported by the developers themselves; this is commonly referred to as Self-Admitted Technical Debt or SATD. Previous work has focused on identifying SATD from source code comments and issue trackers. However, there are no approaches available for automatically identifying SATD from other sources such as commit messages and pull requests, or by combining multiple sources. Therefore, we propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems. Our findings show that our approach outperforms baseline approaches and achieves an average F1-score of 0.611 when detecting four types of SATD (i.e., code/design debt, requirement debt, documentation debt, and test debt) from the four aforementioned sources. Thereafter, we analyze 23.6M code comments, 1.3M commit messages, 3.7M issue sections, and 1.7M pull request sections to characterize SATD in 103 open-source projects. Furthermore, we investigate the SATD keywords and relations between SATD in different sources. The findings indicate, among others, that: 1) SATD is evenly spread among all sources; 2) issues and pull requests are the two most similar sources regarding the number of shared SATD keywords, followed by commit messages, and then followed by code comments; 3) there are four kinds of relations between SATD items in the different sources.

Related papers

Negativity in Self-Admitted Technical Debt: How Sentiment Influences Prioritization [50.07057212504773]
Self-Admitted Technical Debt, or SATD, is a self-admission of technical debt present in a software system. About a quarter of descriptions of SATD in software systems express some form of negativity or negative emotions. Our study shows how developers actively use negativity in SATD to determine how urgently a particular instance of TD should be addressed.
arXiv Detail & Related papers (2025-01-02T05:33:43Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
An Exploratory Study of the Relationship between SATD and Other Software Development Activities [13.026170714454071]
Self-Admitted Technical Debt (SATD) is a specific type of Technical Debt that involves documenting code to remind developers of its debt. Previous research has explored various aspects of SATD, including methods, distribution, and its impact on software quality. This study investigates the relationship between removing and adding SATD and activities such as bug fixing, adding new features, and testing.
arXiv Detail & Related papers (2024-04-02T13:45:42Z)
SATDAUG -- A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt [6.699060157800401]
Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts. We share the textitSATDAUG dataset, an augmented version of existing SATD datasets, including source code comments, issue tracker, pull requests, and commit messages.
arXiv Detail & Related papers (2024-03-12T14:33:53Z)
Evaluating Verifiability in Generative Search Engines [70.59477647085387]
Generative search engines directly generate responses to user queries, along with in-line citations. We conduct human evaluation to audit four popular generative search engines. We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
arXiv Detail & Related papers (2023-04-19T17:56:12Z)
PENTACET data -- 23 Million Contextual Code Comments and 250,000 SATD comments [3.6095388702618414]
Most Self-Admitted Technical Debt (SATD) research uses explicit SATD features such as 'TODO' and 'FIXME' for SATD detection. This work addresses this gap through PENTACET (or 5C dataset) data. The outcome is a dataset with 23 million code comments, preceding and succeeding source code context for each comment, and more than 250,000 comments labeled as SATD.
arXiv Detail & Related papers (2023-03-24T14:42:42Z)
Automatically Identifying Relations Between Self-Admitted Technical Debt Across Different Sources [3.446864074238136]
Self-Admitted Technical Debt or SATD can be found in various sources, such as source code comments, commit messages, issue tracking systems, and pull requests. Previous research has established the existence of relations between SATD items in different sources. We propose and evaluate approaches for automatically identifying SATD relations across different sources.
arXiv Detail & Related papers (2023-03-13T13:03:55Z)
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions [50.114651561111245]
We propose IRCoT, a new approach for multi-step question answering. It interleaves retrieval with steps in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT.
arXiv Detail & Related papers (2022-12-20T18:26:34Z)
Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning [3.446864074238136]
Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits. Most work on identifying Self-Admitted Technical Debt focuses on source code comments. We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
arXiv Detail & Related papers (2022-02-04T15:15:13Z)
S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning. It is based on a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
GMOT-40: A Benchmark for Generic Multiple Object Tracking [65.80411267046786]
We make contributions to boost the study of Generic Multiple Object Tracking (GMOT) in three aspects. First, we construct the first public GMOT dataset, dubbed GMOT-40, which contains 40 carefully annotated sequences evenly distributed among 10 object categories. Second, by noting the lack of devoted tracking algorithms, we have designed a series of baseline GMOT algorithms. Third, we perform a thorough evaluation on GMOT-40, involving popular MOT algorithms and the proposed baselines.
arXiv Detail & Related papers (2020-11-24T02:51:46Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.