Towards Inferential Reproducibility of Machine Learning Research
- URL: http://arxiv.org/abs/2302.04054v7
- Date: Mon, 9 Oct 2023 15:05:36 GMT
- Title: Towards Inferential Reproducibility of Machine Learning Research
- Authors: Michael Hagmann, Philipp Meier and Stefan Riezler
- Abstract summary: Several sources of nondeterminism can be regarded as measurement noise.
Current tendencies to remove noise in order to enforce neglect of research results inherent nondeterminism at the implementation level.
We propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation.
- Score: 16.223631948455797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliability of machine learning evaluation -- the consistency of observed
evaluation scores across replicated model training runs -- is affected by
several sources of nondeterminism which can be regarded as measurement noise.
Current tendencies to remove noise in order to enforce reproducibility of
research results neglect inherent nondeterminism at the implementation level
and disregard crucial interaction effects between algorithmic noise factors and
data properties. This limits the scope of conclusions that can be drawn from
such experiments. Instead of removing noise, we propose to incorporate several
sources of variance, including their interaction with data properties, into an
analysis of significance and reliability of machine learning evaluation, with
the aim to draw inferences beyond particular instances of trained models. We
show how to use linear mixed effects models (LMEMs) to analyze performance
evaluation scores, and to conduct statistical inference with a generalized
likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources
of noise like meta-parameter variations into statistical significance testing,
and to assess performance differences conditional on data properties.
Furthermore, a variance component analysis (VCA) enables the analysis of the
contribution of noise sources to overall variance and the computation of a
reliability coefficient by the ratio of substantial to total variance.
Related papers
- Explainability of Machine Learning Models under Missing Data [2.880748930766428]
Missing data is a prevalent issue that can significantly impair model performance and interpretability.
This paper briefly summarizes the development of the field of missing data and investigates the effects of various imputation methods on the calculation of Shapley values.
arXiv Detail & Related papers (2024-06-29T11:31:09Z) - Towards stable real-world equation discovery with assessing
differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method.
We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z) - Assessing the overall and partial causal well-specification of nonlinear additive noise models [4.13592995550836]
We aim to identify predictor variables for which we can infer the causal effect even in cases of such misspecifications.
We propose an algorithm for finite sample data, discuss its properties, and illustrate its performance on simulated and real data.
arXiv Detail & Related papers (2023-10-25T09:44:16Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Spatio-temporally separable non-linear latent factor learning: an
application to somatomotor cortex fMRI data [0.0]
Models of fMRI data that can perform whole-brain discovery of latent factors are understudied.
New methods for efficient spatial weight-sharing are critical to deal with the high dimensionality of the data and the presence of noise.
Our approach is evaluated on data with multiple motor sub-tasks to assess whether the model captures disentangled latent factors that correspond to each sub-task.
arXiv Detail & Related papers (2022-05-26T21:30:22Z) - Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy.
Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification.
We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Harmonization with Flow-based Causal Inference [12.739380441313022]
This paper presents a normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM) to harmonize medical data.
We evaluate on multiple, large, real-world medical datasets to observe that this method leads to better cross-domain generalization compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-12T19:57:35Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - Uncertainty Quantification in Extreme Learning Machine: Analytical
Developments, Variance Estimates and Confidence Intervals [0.0]
Uncertainty quantification is crucial to assess prediction quality of a machine learning model.
Most methods proposed in the literature make strong assumptions on the data, ignore the randomness of input weights or neglect the bias contribution in confidence interval estimations.
This paper presents novel estimations that overcome these constraints and improve the understanding of ELM variability.
arXiv Detail & Related papers (2020-11-03T13:45:59Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.