Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research
- URL: http://arxiv.org/abs/2405.18077v1
- Date: Tue, 28 May 2024 11:37:59 GMT
- Title: Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research
- Authors: Daniel Vranješ, Oliver Niggemann,
- Abstract summary: Empirical research plays a fundamental role in the machine learning domain.
We propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research.
- Score: 2.3265565167163906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical research plays a fundamental role in the machine learning domain. At the heart of impactful empirical research lies the development of clear research hypotheses, which then shape the design of experiments. The execution of experiments must be carried out with precision to ensure reliable results, followed by statistical analysis to interpret these outcomes. This process is key to either supporting or refuting initial hypotheses. Despite its importance, there is a high variability in research practices across the machine learning community and no uniform understanding of quality criteria for empirical research. To address this gap, we propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research. By embracing these recommendations, greater consistency, enhanced reliability and increased impact can be achieved.
Related papers
- Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment.
First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population.
We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.
Models may behave unreliably due to poorly explored failure modes.
causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations [17.824339932321788]
We identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results.
Our findings indicate that all 10 hazards have the potential to invalidate experimental findings.
We propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards.
arXiv Detail & Related papers (2023-09-11T11:05:34Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Sources of Irreproducibility in Machine Learning: A Review [3.905855359082687]
There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions.
The objective of this paper is to develop a framework that enables applied data science practitioners and researchers to understand which experiment design choices can lead to false findings.
arXiv Detail & Related papers (2022-04-15T18:26:03Z) - Simulation as Experiment: An Empirical Critique of Simulation Research
on Recommender Systems [4.006331916849688]
We argue that simulation studies of recommender system (RS) evolution are conceptually similar to empirical experimental approaches.
By adopting standards and practices common in empirical disciplines, simulation researchers can mitigate many of these weaknesses.
arXiv Detail & Related papers (2021-07-29T21:05:01Z) - Robust multi-stage model-based design of optimal experiments for
nonlinear estimation [0.0]
We study approaches to robust model-based design of experiments in the context of maximum-likelihood estimation.
We propose a novel methodology based on multi-stage robust optimization.
arXiv Detail & Related papers (2020-11-11T19:50:31Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.