Quantifying Inherent Randomness in Machine Learning Algorithms
- URL: http://arxiv.org/abs/2206.12353v1
- Date: Fri, 24 Jun 2022 15:49:52 GMT
- Title: Quantifying Inherent Randomness in Machine Learning Algorithms
- Authors: Soham Raste, Rahul Singh, Joel Vaughan, and Vijayan N. Nair
- Abstract summary: This paper uses an empirical study to examine the effects of randomness in model training and randomness in the partitioning of a dataset into training and test subsets.
We quantify and compare the magnitude of the variation in predictive performance for the following ML algorithms: Random Forests (RFs), Gradient Boosting Machines (GBMs), and Feedforward Neural Networks (FFNNs)
- Score: 7.591218883378448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most machine learning (ML) algorithms have several stochastic elements, and
their performances are affected by these sources of randomness. This paper uses
an empirical study to systematically examine the effects of two sources:
randomness in model training and randomness in the partitioning of a dataset
into training and test subsets. We quantify and compare the magnitude of the
variation in predictive performance for the following ML algorithms: Random
Forests (RFs), Gradient Boosting Machines (GBMs), and Feedforward Neural
Networks (FFNNs). Among the different algorithms, randomness in model training
causes larger variation for FFNNs compared to tree-based methods. This is to be
expected as FFNNs have more stochastic elements that are part of their model
initialization and training. We also found that random splitting of datasets
leads to higher variation compared to the inherent randomness from model
training. The variation from data splitting can be a major issue if the
original dataset has considerable heterogeneity.
Keywords: Model Training, Reproducibility, Variation
Related papers
- Derandomizing Multi-Distribution Learning [28.514129340758938]
Multi-distribution learning involves learning a single predictor that works well across multiple data distributions.
Recent research has shown near-optimal sample complexity achieved with oracle efficient algorithms.
This raises the question: can these algorithms be derandomized to produce a deterministic predictor for multiple distributions?
arXiv Detail & Related papers (2024-09-26T06:28:56Z) - Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks [4.643954670642798]
This paper investigates how various randomization techniques are used in Deep Neural Networks (DNNs)
It categorizes techniques into four types: adding noise to the loss function, masking random gradient updates, data augmentation and weight generalization.
The complete implementation and dataset are available on GitHub.
arXiv Detail & Related papers (2024-04-05T10:02:32Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Learning Likelihood Ratios with Neural Network Classifiers [0.12277343096128711]
approximations of the likelihood ratio may be computed using clever parametrizations of neural network-based classifiers.
We present a series of empirical studies detailing the performance of several common loss functionals and parametrizations of the classifier output.
arXiv Detail & Related papers (2023-05-17T18:11:38Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Bayesian predictive modeling of multi-source multi-way data [0.0]
We consider molecular data from multiple 'omics sources as predictors of early-life iron deficiency (ID) in a rhesus monkey model.
We use a linear model with a low-rank structure on the coefficients to capture multi-way dependence.
We show that our model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients.
arXiv Detail & Related papers (2022-08-05T21:58:23Z) - Performance and Interpretability Comparisons of Supervised Machine
Learning Algorithms: An Empirical Study [3.7881729884531805]
The paper is organized in a findings-based manner, with each section providing general conclusions.
Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models.
RF did not perform well in general, confirming the findings in the literature.
arXiv Detail & Related papers (2022-04-27T12:04:33Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.