Automated Approaches to Detect Self-Admitted Technical Debt: A
Systematic Literature Review
- URL: http://arxiv.org/abs/2312.15020v2
- Date: Tue, 12 Mar 2024 07:12:38 GMT
- Title: Automated Approaches to Detect Self-Admitted Technical Debt: A
Systematic Literature Review
- Authors: Edi Sutoyo, Andrea Capiluppi
- Abstract summary: Self-admitted technical debt (SATD) refers to instances where developers explicitly acknowledge suboptimal code quality or design flaws.
This systematic literature review proposes a taxonomy of feature extraction techniques and ML/DL algorithms used in technical debt detection.
- Score: 6.699060157800401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Technical debt is a pervasive issue in software development, often arising
from trade-offs made during development, which can impede software
maintainability and hinder future development efforts. Self-admitted technical
debt (SATD) refers to instances where developers explicitly acknowledge
suboptimal code quality or design flaws in the codebase. Automated detection of
SATD has emerged as a critical area of research, aiming to assist developers in
identifying and addressing technical debt efficiently. However, the enormous
variety of feature extraction approaches of NLP and algorithms employed in the
literature often hinder researchers from trying to improve their performance.
In light of this, this systematic literature review proposes a taxonomy of
feature extraction techniques and ML/DL algorithms used in technical debt
detection: its objective is to compare and benchmark their performance in the
examined studies. We selected 53 articles that passed the quality evaluation of
the systematic review. We then investigated in depth which feature extractions
and algorithms were employed to identify technical debt in each software
development activity. All approaches proposed in the analyzed studies were
grouped into NLP, NLP+ML, and NLP+DL. This allows us to discuss the performance
in three different ways. Overall, NLP+DL group consistently outperforms in
precision and F1-score for all projects, and in all but one project for the
recall metric. Regarding the feature extraction techniques, the PTE
consistently achieves higher precision, recall, and F1-score for each project
analyzed. Furthermore, TD types have been mappep to software development
activities; this served to determine the best-performing feature extractions
and algorithms for each development activity. Finally, based on the review
results, we also identify implications that could be of concern to researchers
and practitioners.
Related papers
- Machine Learning Pipeline for Software Engineering: A Systematic Literature Review [0.0]
This systematic literature review examines state-of-the-art Machine Learning pipelines designed for software engineering (SE)<n>Our findings show that robust preprocessing, such as SMOTE for data balancing, improves model reliability.<n> Ensemble methods like Random Forest and Gradient Boosting dominate performance across tasks.<n>New metrics like Best Arithmetic Mean (BAM) are emerging in niche applications.
arXiv Detail & Related papers (2025-07-31T15:37:30Z) - T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation [60.620408007636016]
We propose T2I-Eval-R1, a novel reinforcement learning framework that trains open-source MLLMs using only coarse-grained quality scores.<n>Our approach integrates Group Relative Policy Optimization into the instruction-tuning process, enabling models to generate both scalar scores and interpretable reasoning chains.
arXiv Detail & Related papers (2025-05-23T13:44:59Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Offline Model-Based Optimization: Comprehensive Review [61.91350077539443]
offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets.
Recent advances in model-based optimization have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models.
Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review.
arXiv Detail & Related papers (2025-03-21T16:35:02Z) - What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration.
We identify the critical limitations of regression-based methods with the widely used data generation pipeline.
We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z) - Leveraging Conversational Generative AI for Anomaly Detection in Digital Substations [0.0]
The research employs advanced performance metrics to conduct a comparative assessment between the proposed AD and HITL-based AD frameworks.
This approach presents a promising solution for enhancing the reliability of power system operations in the face of evolving cybersecurity challenges.
arXiv Detail & Related papers (2024-11-09T18:38:35Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing.
The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z) - A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications [0.0]
Cybersecurity breaches in digital substations pose significant challenges to the stability and reliability of power system operations.
This paper proposes a task-oriented dialogue system for anomaly detection (AD) in datasets of multicast messages.
It has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans.
arXiv Detail & Related papers (2024-06-08T13:28:50Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices [3.1466086042810884]
Transformer-based language models have set new benchmarks across a wide range of NLP tasks.
reliably estimating the uncertainty of their predictions remains a significant challenge.
We tackle these limitations by harnessing the geometry of attention maps across multiple heads and layers to assess model confidence.
Our method significantly outperforms existing uncertainty estimation techniques on benchmarks for acceptability judgments and artificial text detection.
arXiv Detail & Related papers (2023-08-22T09:17:45Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Deep Transfer Learning for Automatic Speech Recognition: Towards Better
Generalization [3.6393183544320236]
Speech recognition has become an important challenge when using deep learning (DL)
It requires large-scale training datasets and high computational and storage resources.
Deep transfer learning (DTL) has been introduced to overcome these issues.
arXiv Detail & Related papers (2023-04-27T21:08:05Z) - Measuring Improvement of F$_1$-Scores in Detection of Self-Admitted
Technical Debt [5.750379648650073]
We improve SATD detection with a novel approach that leverages the Bidirectional Representations from Transformers (BERT) architecture.
We find that our trained BERT model improves over the best performance of all previous methods in 19 of the 20 projects in cross-project scenarios.
Future research will look into ways to diversify SATD datasets in order to maximize the latent power in large BERT models.
arXiv Detail & Related papers (2023-03-16T19:47:38Z) - On the Reliability and Explainability of Language Models for Program
Generation [15.569926313298337]
We study the capabilities and limitations of automated program generation approaches.
We employ advanced explainable AI approaches to highlight the tokens that significantly contribute to the code transformation.
Our analysis reveals that, in various experimental scenarios, language models can recognize code grammar and structural information, but they exhibit limited robustness to changes in input sequences.
arXiv Detail & Related papers (2023-02-19T14:59:52Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.