Related papers: On the Replicability and Reproducibility of Deep Learning in Software Engineering

On the Replicability and Reproducibility of Deep Learning in Software Engineering

URL: http://arxiv.org/abs/2006.14244v1
Date: Thu, 25 Jun 2020 08:20:10 GMT
Title: On the Replicability and Reproducibility of Deep Learning in Software Engineering
Authors: Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John Grundy, Xiaohu Yang
Abstract summary: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. Many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness. They often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data.
Score: 16.828220584270507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) reproducibility - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data. Unlike traditional machine learning (ML) models, DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process. In this study, we conducted a literature review on 93 DL studies recently published in twenty SE journals or conferences. Our statistics show the urgency of investigating these two factors in SE. Moreover, we re-ran four representative DL models in SE. Experimental results show the importance of replicability and reproducibility, where the reported performance of a DL model could not be replicated for an unstable optimization process. Reproducibility could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. It is therefore urgent for the SE community to provide a long-lasting link to a replication package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

Related papers

A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops [55.07063067759609]
High-quality data is essential for training large generative models, yet the vast reservoir of real data available online has become nearly depleted. Models increasingly generate their own data for further training, forming Self-consuming Training Loops (STLs) Some models degrade or even collapse, while others successfully avoid these failures, leaving a significant gap in theoretical understanding.
arXiv Detail & Related papers (2025-02-26T06:18:13Z)
Self-Supervised Learning for Time Series: A Review & Critique of FITS [0.0]
Recently proposed model, FITS, claims competitive performance with significantly reduced parameter counts. By training a one-layer neural network in the complex frequency domain, we are able to replicate these results. Our experiments reveal that FITS especially excels at capturing periodic and seasonal patterns, but struggles with trending, non-periodic, or random-resembling behavior.
arXiv Detail & Related papers (2024-10-23T23:03:09Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise. We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets. We investigated model capabilities, training data, and model interpretation. Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z)
Towards Training Reproducible Deep Learning Models [26.547756923322126]
Deep Learning (DL) models are challenging to be reproduced due to issues like randomness in the software and non-determinism in the hardware. This paper proposes a systematic approach to training reproducible DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
arXiv Detail & Related papers (2022-02-04T18:14:39Z)
Learning Mixtures of Linear Dynamical Systems [94.49754087817931]
We develop a two-stage meta-algorithm to efficiently recover each ground-truth LDS model up to error $tildeO(sqrtd/T)$. We validate our theoretical studies with numerical experiments, confirming the efficacy of the proposed algorithm.
arXiv Detail & Related papers (2022-01-26T22:26:01Z)
Synthesizing Irreproducibility in Deep Networks [2.28438857884398]
Modern day deep networks suffer from irreproducibility (also referred to as nondeterminism or underspecification) We show that even with a single nonlinearity and for very simple data and models, irreproducibility occurs. Model complexity and the choice of nonlinearity also play significant roles in making deep models irreproducible.
arXiv Detail & Related papers (2021-02-21T21:51:28Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
A Close Look at Deep Learning with Small Data [0.0]
We show that model complexity is a critical factor when only a few samples per class are available. We also show that even standard data augmentation can boost recognition performance by large margins.
arXiv Detail & Related papers (2020-03-28T17:11:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.