Reproducibility, energy efficiency and performance of pseudorandom
number generators in machine learning: a comparative study of python, numpy,
tensorflow, and pytorch implementations
- URL: http://arxiv.org/abs/2401.17345v2
- Date: Sat, 10 Feb 2024 12:09:18 GMT
- Title: Reproducibility, energy efficiency and performance of pseudorandom
number generators in machine learning: a comparative study of python, numpy,
tensorflow, and pytorch implementations
- Authors: Benjamin Antunes, David R.C Hill
- Abstract summary: Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning technologies because they are interesting for numerous methods.
This study investigates whether the leading Pseudo-Random Number Generators (PRNGs) employed in machine learning languages, libraries, and frameworks uphold statistical quality and numerical when compared to the original C implementation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine
learning technologies because they are interesting for numerous methods. The
field of machine learning holds the potential for substantial advancements
across various domains, as exemplified by recent breakthroughs in Large
Language Models (LLMs). However, despite the growing interest, persistent
concerns include issues related to reproducibility and energy consumption.
Reproducibility is crucial for robust scientific inquiry and explainability,
while energy efficiency underscores the imperative to conserve finite global
resources. This study delves into the investigation of whether the leading
Pseudo-Random Number Generators (PRNGs) employed in machine learning languages,
libraries, and frameworks uphold statistical quality and numerical
reproducibility when compared to the original C implementation of the
respective PRNG algorithms. Additionally, we aim to evaluate the time
efficiency and energy consumption of various implementations. Our experiments
encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne
Twister, PCG, and Philox algorithms. Remarkably, we verified that the temporal
performance of machine learning technologies closely aligns with that of
C-based implementations, with instances of achieving even superior
performances. On the other hand, it is noteworthy that ML technologies consumed
only 10% more energy than their C-implementation counterparts. However, while
statistical quality was found to be comparable, achieving numerical
reproducibility across different platforms for identical seeds and algorithms
was not achieved.
Related papers
- Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies [0.0]
We compare the statistical quality of PRNGs used in ML frameworks against their original C implementations.<n>Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests.
arXiv Detail & Related papers (2025-07-02T09:38:00Z) - Successive randomized compression: A randomized algorithm for the compressed MPO-MPS product [0.6554326244334868]
This paper introduces a new single-pass, randomized algorithm, called successive randomized compression (SRC)
The performance of the new algorithm is evaluated on synthetic problems and unitary time evolution problems for quantum spin systems.
arXiv Detail & Related papers (2025-04-08T22:33:49Z) - Regression and Classification with Single-Qubit Quantum Neural Networks [0.0]
We use a resource-efficient and scalable Single-Qubit Quantum Neural Network (SQQNN) for both regression and classification tasks.
For classification, we introduce a novel training method inspired by the Taylor series, which can efficiently find a global minimum in a single step.
The SQQNN exhibits virtually error-free and strong performance in regression and classification tasks, including the MNIST dataset.
arXiv Detail & Related papers (2024-12-12T17:35:36Z) - Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications [0.0]
Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect algorithms.
We investigate the statistical properties of floating-point non-associativity within modern parallel programming models.
We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning.
arXiv Detail & Related papers (2024-08-09T16:07:37Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Quantum Generative Modeling of Sequential Data with Trainable Token
Embedding [0.0]
A quantum-inspired generative model known as the Born machines have shown great advancements in learning classical and quantum data.
We generalize the embedding method into trainable quantum measurement operators that can be simultaneously honed with MPS.
Our study indicated that combined with trainable embedding, Born machines can exhibit better performance and learn deeper correlations from the dataset.
arXiv Detail & Related papers (2023-11-08T22:56:37Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Multi-Fidelity Active Learning with GFlowNets [65.91555804996203]
We propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates.
Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart.
arXiv Detail & Related papers (2023-06-20T17:43:42Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Performance and Energy Consumption of Parallel Machine Learning
Algorithms [0.0]
Machine learning models have achieved remarkable success in various real-world applications.
Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly.
Parallelization of training algorithms is a common strategy to speed up the process of training.
arXiv Detail & Related papers (2023-05-01T13:04:39Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Performance Analysis and Comparison of Machine and Deep Learning
Algorithms for IoT Data Classification [0.0]
This paper evaluates the performance of 11 popular machine and deep learning algorithms for classification task using six IoT-related datasets.
Considering all performance metrics, Random Forests performed better than other machine learning models, while among deep learning models, ANN and CNN achieved more interesting results.
arXiv Detail & Related papers (2020-01-27T09:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.