Related papers: Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt

Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt

URL: http://arxiv.org/abs/2410.15804v1
Date: Mon, 21 Oct 2024 09:22:16 GMT
Title: Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt
Authors: Edi Sutoyo, Paris Avgeriou, Andrea Capiluppi,
Abstract summary: Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal. We build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD. We introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts.
Score: 6.004718679054704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal. Past research in detecting SATD has focused on either identifying SATD (classifying SATD items as SATD or not) or categorizing SATD (labeling instances as SATD that pertain to requirement, design, code, test debt, etc.). However, the performance of these approaches remains suboptimal, particularly for specific types of SATD, such as test and requirement debt, primarily due to extremely imbalanced datasets. To address these challenges, we build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD. Despite their effectiveness, both architectures struggle with imbalanced data. Therefore, we employ a large language model data augmentation strategy to mitigate this issue. Furthermore, we introduce a two-step approach to identify and categorize SATD across various datasets derived from different artifacts. Our contributions include providing a balanced dataset for future SATD researchers and demonstrating that our approach significantly improves SATD identification and categorization performance compared to baseline methods.

Related papers

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval. ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
An Exploratory Study of the Relationship between SATD and Other Software Development Activities [13.026170714454071]
Self-Admitted Technical Debt (SATD) is a specific type of Technical Debt that involves documenting code to remind developers of its debt. Previous research has explored various aspects of SATD, including methods, distribution, and its impact on software quality. This study investigates the relationship between removing and adding SATD and activities such as bug fixing, adding new features, and testing.
arXiv Detail & Related papers (2024-04-02T13:45:42Z)
SATDAUG -- A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt [6.699060157800401]
Self-admitted technical debt (SATD) refers to a form of technical debt in which developers explicitly acknowledge and document the existence of technical shortcuts. We share the textitSATDAUG dataset, an augmented version of existing SATD datasets, including source code comments, issue tracker, pull requests, and commit messages.
arXiv Detail & Related papers (2024-03-12T14:33:53Z)
A new algorithm for Subgroup Set Discovery based on Information Gain [58.720142291102135]
Information Gained Subgroup Discovery (IGSD) is a new SD algorithm for pattern discovery. We compare IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets.
arXiv Detail & Related papers (2023-07-26T21:42:34Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
W2SAT: Learning to generate SAT instances from Weighted Literal Incidence Graphs [11.139131079925113]
W2SAT is a framework to generate SAT formulas by learning intrinsic structures and properties from given real-world/industrial instances. We introduce a novel SAT representation called Weighted Literal Incidence Graph (WLIG), which exhibits strong representation ability and generalizability. Decoding from WLIG into SAT problems is then modeled as finding overlapping cliques with a novel hill-climbing optimization method.
arXiv Detail & Related papers (2023-02-01T06:30:41Z)
Estimating the hardness of SAT encodings for Logical Equivalence Checking of Boolean circuits [58.83758257568434]
We show that the hardness of SAT encodings for LEC instances can be estimated textitw.r.t some SAT partitioning. The paper proposes several methods for constructing partitionings, which, when used in practice, allow one to estimate the hardness of SAT encodings for LEC with good accuracy.
arXiv Detail & Related papers (2022-10-04T09:19:13Z)
Robust Continual Test-time Adaptation: Instance-aware BN and Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams. Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z)
DeepSAT: An EDA-Driven Learning Framework for SAT [9.111341161918375]
We present DeepSAT, a novel end-to-end learning framework for the Boolean satisfiability (SAT) problem. DeepSAT achieves significant accuracy improvements over state-of-the-art learning-based SAT solutions.
arXiv Detail & Related papers (2022-05-27T03:20:42Z)
Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning [3.446864074238136]
Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits. Most work on identifying Self-Admitted Technical Debt focuses on source code comments. We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
arXiv Detail & Related papers (2022-02-04T15:15:13Z)
Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis [63.53283025435107]
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem.
arXiv Detail & Related papers (2021-07-15T04:47:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.