Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research
- URL: http://arxiv.org/abs/2405.18077v1
- Date: Tue, 28 May 2024 11:37:59 GMT
- Title: Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research
- Authors: Daniel Vranješ, Oliver Niggemann,
- Abstract summary: Empirical research plays a fundamental role in the machine learning domain.
We propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research.
- Score: 2.3265565167163906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical research plays a fundamental role in the machine learning domain. At the heart of impactful empirical research lies the development of clear research hypotheses, which then shape the design of experiments. The execution of experiments must be carried out with precision to ensure reliable results, followed by statistical analysis to interpret these outcomes. This process is key to either supporting or refuting initial hypotheses. Despite its importance, there is a high variability in research practices across the machine learning community and no uniform understanding of quality criteria for empirical research. To address this gap, we propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research. By embracing these recommendations, greater consistency, enhanced reliability and increased impact can be achieved.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations.
We introduce Scientific Generative Agent (SGA), a bilevel optimization framework.
We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z) - Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations [17.824339932321788]
We identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results.
Our findings indicate that all 10 hazards have the potential to invalidate experimental findings.
We propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards.
arXiv Detail & Related papers (2023-09-11T11:05:34Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - CausalBench: A Large-scale Benchmark for Network Inference from
Single-cell Perturbation Data [61.088705993848606]
We introduce CausalBench, a benchmark suite for evaluating causal inference methods on real-world interventional data.
CaulBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics.
arXiv Detail & Related papers (2022-10-31T13:04:07Z) - Sources of Irreproducibility in Machine Learning: A Review [3.905855359082687]
There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions.
The objective of this paper is to develop a framework that enables applied data science practitioners and researchers to understand which experiment design choices can lead to false findings.
arXiv Detail & Related papers (2022-04-15T18:26:03Z) - Simulation as Experiment: An Empirical Critique of Simulation Research
on Recommender Systems [4.006331916849688]
We argue that simulation studies of recommender system (RS) evolution are conceptually similar to empirical experimental approaches.
By adopting standards and practices common in empirical disciplines, simulation researchers can mitigate many of these weaknesses.
arXiv Detail & Related papers (2021-07-29T21:05:01Z) - Robust multi-stage model-based design of optimal experiments for
nonlinear estimation [0.0]
We study approaches to robust model-based design of experiments in the context of maximum-likelihood estimation.
We propose a novel methodology based on multi-stage robust optimization.
arXiv Detail & Related papers (2020-11-11T19:50:31Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.