Related papers: Investigating Reproducibility in Deep Learning-Based Software Fault Prediction

Investigating Reproducibility in Deep Learning-Based Software Fault Prediction

URL: http://arxiv.org/abs/2402.05645v1
Date: Thu, 8 Feb 2024 13:00:18 GMT
Title: Investigating Reproducibility in Deep Learning-Based Software Fault Prediction
Authors: Adil Mukhtar, Dietmar Jannach, Franz Wotawa
Abstract summary: With the rapid adoption of increasingly complex machine learning models, it becomes more and more difficult for scholars to reproduce the results that are reported in the literature. This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. We have conducted a systematic review of the current literature and examined the level of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences.
Score: 16.25827159504845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the past few years, deep learning methods have been applied for a wide range of Software Engineering (SE) tasks, including in particular for the important task of automatically predicting and localizing faults in software. With the rapid adoption of increasingly complex machine learning models, it however becomes more and more difficult for scholars to reproduce the results that are reported in the literature. This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared. Given some recent -- and very worrying -- findings regarding reproducibility and progress in other areas of applied machine learning, the goal of this work is to analyze to what extent the field of software engineering, in particular in the area of software fault prediction, is plagued by similar problems. We have therefore conducted a systematic review of the current literature and examined the level of reproducibility of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences. Our analysis revealed that scholars are apparently largely aware of the reproducibility problem, and about two thirds of the papers provide code for their proposed deep learning models. However, it turned out that in the vast majority of cases, crucial elements for reproducibility are missing, such as the code of the compared baselines, code for data pre-processing or code for hyperparameter tuning. In these cases, it therefore remains challenging to exactly reproduce the results in the current research literature. Overall, our meta-analysis therefore calls for improved research practices to ensure the reproducibility of machine-learning based research.

Related papers

A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models [2.518519330408713]
Large Language Models (LLMs) in software engineering have sparked interest in their use for software vulnerability detection.<n>The rapid development of this field has resulted in a fragmented research landscape.<n>This fragmentation makes it difficult to obtain a clear overview of the state-of-the-art or compare and categorize studies meaningfully.
arXiv Detail & Related papers (2025-07-30T13:17:16Z)
Metamorphic Testing of Deep Code Models: A Systematic Literature Review [9.09091334696889]
Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field.<n>These models can process source code and software artifacts with high accuracy in tasks such as code completion, defect detection, and code summarization.<n> robustness remains a critical quality attribute for deep-code models as they may produce different results under varied and adversarial conditions.
arXiv Detail & Related papers (2025-07-30T12:25:30Z)
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis [3.334697938650665]
There have been increasing concerns about reproducing the results of deep learning methods.<n>More than 70% of researchers failed to reproduce other researchers experiments and over 50% failed to reproduce their own experiments.<n>This paper presents a systematic approach at analyzing and improving deep learning models.
arXiv Detail & Related papers (2025-05-06T04:20:15Z)
Automated Unit Test Case Generation: A Systematic Literature Review [2.273531916003657]
This review aims to consolidate existing knowledge in regards to the evolutionary approaches as well as their improvements and resulting limitations. We will explore the main test criterion that are used in these algorithms alongside the challenges currently faced in the field related to readability, mocking and more.
arXiv Detail & Related papers (2025-04-29T01:50:06Z)
Machine learning applications in archaeological practices: a review [0.0]
We reviewed 135 articles published between 1997 and 2022. Automatic structure detection and artefact classification were the most represented tasks. We observed, in some cases, poorly defined requirements and caveats of the machine learning methods used.
arXiv Detail & Related papers (2025-01-07T14:50:05Z)
State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z)
Resilience of Deep Learning applications: a systematic literature review of analysis and hardening techniques [3.265458968159693]
The review is based on 220 scientific articles published between January 2019 and March 2024. The authors adopt a classifying framework to interpret and highlight research similarities and peculiarities.
arXiv Detail & Related papers (2023-09-27T19:22:19Z)
On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities [4.8460847676785175]
As the smartphone market leader, Android has been a prominent target for malware attacks. For market holders and researchers, in particular, the large number of samples has made manual malware detection unfeasible. While some of the proposed approaches achieve high performance, rapidly evolving Android malware has made them unable to maintain their accuracy over time.
arXiv Detail & Related papers (2023-06-12T13:52:28Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Deep Learning for Anomaly Detection in Log Data: A Survey [3.508620069426877]
Self-learning anomaly detection techniques capture patterns in log data and report unexpected log event occurrences. Deep learning neural networks for this purpose have been presented. There exist many different architectures for deep learning and it is non-trivial to encode raw and unstructured log data.
arXiv Detail & Related papers (2022-07-08T10:58:28Z)
A Survey on Machine Learning Techniques for Source Code Analysis [14.129976741300029]
We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021.
arXiv Detail & Related papers (2021-10-18T20:13:38Z)
Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Deep learning has become its own subfield of machine learning. In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z)
Knowledge as Invariance -- History and Perspectives of Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point. Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks. This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z)
Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise [21.491392581672198]
We present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study. We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER) We demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
arXiv Detail & Related papers (2020-10-16T14:21:19Z)
Bayesian active learning for production, a systematic study and a reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.