Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations
- URL: http://arxiv.org/abs/2309.05381v1
- Date: Mon, 11 Sep 2023 11:05:34 GMT
- Title: Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations
- Authors: Salah Ghamizi, Maxime Cordy, Yuejun Guo, Mike Papadakis, And Yves Le
Traon
- Abstract summary: We identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results.
Our findings indicate that all 10 hazards have the potential to invalidate experimental findings.
We propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards.
- Score: 17.824339932321788
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Much research on Machine Learning testing relies on empirical studies that
evaluate and show their potential. However, in this context empirical results
are sensitive to a number of parameters that can adversely impact the results
of the experiments and potentially lead to wrong conclusions (Type I errors,
i.e., incorrectly rejecting the Null Hypothesis). To this end, we survey the
related literature and identify 10 commonly adopted empirical evaluation
hazards that may significantly impact experimental results. We then perform a
sensitivity analysis on 30 influential studies that were published in top-tier
SE venues, against our hazard set and demonstrate their criticality. Our
findings indicate that all 10 hazards we identify have the potential to
invalidate experimental findings, such as those made by the related literature,
and should be handled properly. Going a step further, we propose a point set of
10 good empirical practices that has the potential to mitigate the impact of
the hazards. We believe our work forms the first step towards raising awareness
of the common pitfalls and good practices within the software engineering
community and hopefully contribute towards setting particular expectations for
empirical research in the field of deep learning testing.
Related papers
- Contexts Matter: An Empirical Study on Contextual Influence in Fairness Testing for Deep Learning Systems [3.077531983369872]
We aim to understand how varying contexts affect fairness testing outcomes.
Our results show that different context types and settings generally lead to a significant impact on the testing.
arXiv Detail & Related papers (2024-08-12T12:36:06Z) - Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research [2.3265565167163906]
Empirical research plays a fundamental role in the machine learning domain.
We propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research.
arXiv Detail & Related papers (2024-05-28T11:37:59Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - On (Mis)perceptions of testing effectiveness: an empirical study [1.8026347864255505]
This research aims to discover how well the perceptions of the defect detection effectiveness of different techniques match their real effectiveness in the absence of prior experience.
In the original study, we conduct a controlled experiment with students applying two testing techniques and a code review technique.
At the end of the experiment, they take a survey to find out which technique they perceive to be most effective.
The results of the replicated study confirm the findings of the original study and suggest that participants' perceptions might be based not on their opinions about complexity or preferences for techniques but on how well they think that they have applied the techniques.
arXiv Detail & Related papers (2024-02-11T14:50:01Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - An introduction to causal reasoning in health analytics [2.199093822766999]
We will try to highlight some of the drawbacks that may arise in traditional machine learning and statistical approaches to analyze the observational data.
We will demonstrate the applications of causal inference in tackling some common machine learning issues.
arXiv Detail & Related papers (2021-05-10T20:25:56Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.