Related papers: Towards Inferential Reproducibility of Machine Learning Research

Towards Inferential Reproducibility of Machine Learning Research

URL: http://arxiv.org/abs/2302.04054v7
Date: Mon, 9 Oct 2023 15:05:36 GMT
Title: Towards Inferential Reproducibility of Machine Learning Research
Authors: Michael Hagmann, Philipp Meier and Stefan Riezler
Abstract summary: Several sources of nondeterminism can be regarded as measurement noise. Current tendencies to remove noise in order to enforce neglect of research results inherent nondeterminism at the implementation level. We propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation.
Score: 16.223631948455797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.

Related papers

Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z)
Interpretable Credit Default Prediction with Ensemble Learning and SHAP [3.948008559977866]
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms.<n>The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems.<n>The external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value.
arXiv Detail & Related papers (2025-05-27T07:23:22Z)
An extensive simulation study evaluating the interaction of resampling techniques across multiple causal discovery contexts [2.0946534289186842]
We present theoretical results proving that certain resampling methods emulate the assignment of specific values to algorithm tuning parameters. We also report the results of extensive simulation experiments, which verify the theoretical result and provide substantial data.
arXiv Detail & Related papers (2025-03-19T17:18:18Z)
Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models [0.5223954072121659]
Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models. In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference. The proposed method is straightforward and practical to implement and has a broad applicability in fields where outlier detection or removal is challenging.
arXiv Detail & Related papers (2024-12-29T21:22:24Z)
Explainability of Machine Learning Models under Missing Data [2.880748930766428]
Missing data is a prevalent issue that can significantly impair model performance and interpretability. This paper briefly summarizes the development of the field of missing data and investigates the effects of various imputation methods on the calculation of Shapley values.
arXiv Detail & Related papers (2024-06-29T11:31:09Z)
Towards stable real-world equation discovery with assessing differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z)
Assessing the overall and partial causal well-specification of nonlinear additive noise models [4.13592995550836]
We aim to identify predictor variables for which we can infer the causal effect even in cases of such misspecifications. We propose an algorithm for finite sample data, discuss its properties, and illustrate its performance on simulated and real data.
arXiv Detail & Related papers (2023-10-25T09:44:16Z)
Improving the Robustness of Summarization Models by Detecting and Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes. We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z)
Spatio-temporally separable non-linear latent factor learning: an application to somatomotor cortex fMRI data [0.0]
Models of fMRI data that can perform whole-brain discovery of latent factors are understudied. New methods for efficient spatial weight-sharing are critical to deal with the high dimensionality of the data and the presence of noise. Our approach is evaluated on data with multiple motor sub-tasks to assess whether the model captures disentangled latent factors that correspond to each sub-task.
arXiv Detail & Related papers (2022-05-26T21:30:22Z)
Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy. Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification. We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Harmonization with Flow-based Causal Inference [12.739380441313022]
This paper presents a normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM) to harmonize medical data. We evaluate on multiple, large, real-world medical datasets to observe that this method leads to better cross-domain generalization compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-12T19:57:35Z)
Adaptive Multi-View ICA: Estimation of noise levels for optimal inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources. On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator. On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z)
Uncertainty Quantification in Extreme Learning Machine: Analytical Developments, Variance Estimates and Confidence Intervals [0.0]
Uncertainty quantification is crucial to assess prediction quality of a machine learning model. Most methods proposed in the literature make strong assumptions on the data, ignore the randomness of input weights or neglect the bias contribution in confidence interval estimations. This paper presents novel estimations that overcome these constraints and improve the understanding of ELM variability.
arXiv Detail & Related papers (2020-11-03T13:45:59Z)
Estimating Structural Target Functions using Machine Learning and Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models. This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics. We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.