Related papers: Pitfalls in Experiments with DNN4SE: An Analysis of the State of the Practice

Pitfalls in Experiments with DNN4SE: An Analysis of the State of the Practice

URL: http://arxiv.org/abs/2305.11556v1
Date: Fri, 19 May 2023 09:55:48 GMT
Title: Pitfalls in Experiments with DNN4SE: An Analysis of the State of the Practice
Authors: Sira Vegas, Sebastian Elbaum
Abstract summary: We conduct a mapping study, examining 194 experiments with techniques that rely on deep neural networks appearing in 55 papers published in premier software engineering venues. Our study reveals that most of the experiments, including those that have received ACM artifact badges, have fundamental limitations that raise doubts about the reliability of their findings.
Score: 0.7614628596146599
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Software engineering techniques are increasingly relying on deep learning approaches to support many software engineering tasks, from bug triaging to code generation. To assess the efficacy of such techniques researchers typically perform controlled experiments. Conducting these experiments, however, is particularly challenging given the complexity of the space of variables involved, from specialized and intricate architectures and algorithms to a large number of training hyper-parameters and choices of evolving datasets, all compounded by how rapidly the machine learning technology is advancing, and the inherent sources of randomness in the training process. In this work we conduct a mapping study, examining 194 experiments with techniques that rely on deep neural networks appearing in 55 papers published in premier software engineering venues to provide a characterization of the state-of-the-practice, pinpointing experiments common trends and pitfalls. Our study reveals that most of the experiments, including those that have received ACM artifact badges, have fundamental limitations that raise doubts about the reliability of their findings. More specifically, we find: weak analyses to determine that there is a true relationship between independent and dependent variables (87% of the experiments); limited control over the space of DNN relevant variables, which can render a relationship between dependent variables and treatments that may not be causal but rather correlational (100% of the experiments); and lack of specificity in terms of what are the DNN variables and their values utilized in the experiments (86% of the experiments) to define the treatments being applied, which makes it unclear whether the techniques designed are the ones being assessed, or how the sources of extraneous variation are controlled. We provide some practical recommendations to address these limitations.

Related papers

MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python. It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z)
Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables. In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy. Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z)
Machine learning enabled experimental design and parameter estimation for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED) Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED. Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z)
Experts in the Loop: Conditional Variable Selection for Accelerating Post-Silicon Analysis Based on Deep Learning [6.6357750579293935]
Post-silicon validation is one of the most critical processes in semiconductor manufacturing. This work aims to design a novel conditional variable selection approach while keeping experts in the loop.
arXiv Detail & Related papers (2022-09-30T06:12:12Z)
Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning [0.0]
This manuscript offers the perspective of experimentalists on a number of modern data-driven techniques. It is compared in terms of data requirements, ease of use, computational burden, and robustness in the context of real-world applications.
arXiv Detail & Related papers (2022-05-31T11:40:22Z)
Do Deep Neural Networks Always Perform Better When Eating More Data? [82.6459747000664]
We design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD) Under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of class information. Under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain.
arXiv Detail & Related papers (2022-05-30T15:40:33Z)
Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design [1.6249267147413522]
Sequential design of experiments (SDOE) is a popular suite of methods, that has yielded promising results in recent years. In this work, we aim to extend the SDOE strategy, to query the experiment or computer code at a batch of inputs. A unique capability of the proposed methodology is its ability to be applied to multiple tasks, for example optimization of a function, once its trained.
arXiv Detail & Related papers (2021-12-21T02:25:23Z)
Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding [48.7576911714538]
Finding the optimal process parameters for an adhesive bonding process is challenging. Traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem. In this research, we successfully applied specific machine learning techniques to emulate the objective and constraint functions.
arXiv Detail & Related papers (2021-12-16T10:14:39Z)
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data. We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z)
Autonomous Materials Discovery Driven by Gaussian Process Regression with Inhomogeneous Measurement Noise and Anisotropic Kernels [1.976226676686868]
A majority of experimental disciplines face the challenge of exploring large and high-dimensional parameter spaces in search of new scientific discoveries. Recent advances have led to an increase in efficiency of materials discovery by increasingly automating the exploration processes. Gamma process regression (GPR) techniques have emerged as the method of choice for steering many classes of experiments.
arXiv Detail & Related papers (2020-06-03T19:18:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.