Randomness control and reproducibility study of random forest algorithm in R and Python
- URL: http://arxiv.org/abs/2408.12184v1
- Date: Thu, 22 Aug 2024 07:59:49 GMT
- Title: Randomness control and reproducibility study of random forest algorithm in R and Python
- Authors: Louisa Camadini, Yanis Bouzid, Maeva Merlet, LĂ©opold Carron,
- Abstract summary: We will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.
We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When it comes to the safety of cosmetic products, compliance with regulatory standards is crucialto guarantee consumer protection against the risks of skin irritation. Toxicologists must thereforebe fully conversant with all risks. This applies not only to their day-to-day work, but also to allthe algorithms they integrate into their routines. Recognizing this, ensuring the reproducibility ofalgorithms becomes one of the most crucial aspects to address.However, how can we prove the robustness of an algorithm such as the random forest, that reliesheavily on randomness? In this report, we will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.Our goal is to investigate the parameters and sources of randomness affecting the outcomes ofRandom Forest algorithms.By setting comparable parameters and using the same Pseudo-Random Number Generator (PRNG),we expect to reproduce results consistently across the various available implementations of therandom forest algorithm. Nevertheless, this exploration will unveil hidden layers of randomness andguide our understanding of the critical parameters necessary to ensure reproducibility across all fourimplementations of the random forest algorithm.
Related papers
- DynFrs: An Efficient Framework for Machine Unlearning in Random Forest [2.315324942451179]
DynFrs is a framework designed to enable efficient machine unlearning in Random Forests.
In experiments, applying Dynfrs on Extremely Trees yields substantial improvements.
arXiv Detail & Related papers (2024-10-02T14:20:30Z) - Efficient Quality Estimation of True Random Bit-streams [5.441027708840589]
This paper reports the implementation and characterization of an on-line procedure for the detection of anomalies in a true random bit stream.
The experimental validation of the approach is performed upon the bit streams generated by a quantum, silicon-based entropy source.
arXiv Detail & Related papers (2024-09-09T12:09:17Z) - Grafting: Making Random Forests Consistent [0.0]
Little is known about the theory of Random Forests.
A major unanswered question is whether, or when, the Random Forest algorithm is consistent.
arXiv Detail & Related papers (2024-03-09T21:29:25Z) - Batch Bayesian Optimization for Replicable Experimental Design [56.64902148159355]
Many real-world design problems evaluate multiple experimental conditions in parallel and replicate each condition multiple times due to large and heteroscedastic observation noise.
We propose the Batch Thompson Sampling for Replicable Experimental Design framework, which encompasses three algorithms.
We show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.
arXiv Detail & Related papers (2023-11-02T12:46:03Z) - We need to talk about random seeds [16.33770822558325]
This opinion piece argues that there are some safe uses for random seeds.
An analysis of 85 recent publications from the ACL Anthology finds that more than 50% contain risky uses of random seeds.
arXiv Detail & Related papers (2022-10-24T16:48:45Z) - Testing randomness of series generated in Bell's experiment [62.997667081978825]
We use a toy fiber optic based setup to generate binary series, and evaluate their level of randomness according to Ville principle.
Series are tested with a battery of standard statistical indicators, Hurst, Kolmogorov complexity, minimum entropy, Takensarity dimension of embedding, and Augmented Dickey Fuller and Kwiatkowski Phillips Schmidt Shin to check station exponent.
The level of randomness of series obtained by applying Toeplitz extractor to rejected series is found to be indistinguishable from the level of non-rejected raw ones.
arXiv Detail & Related papers (2022-08-31T17:39:29Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Improved Weighted Random Forest for Classification Problems [3.42658286826597]
The key to make well-performing ensemble model is in the diversity of the base models.
We propose several algorithms that intend to modify the weighting strategy of regular random forest.
The proposed models are able to introduce significant improvements compared to regular random forest.
arXiv Detail & Related papers (2020-09-01T16:08:45Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.