On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization
- URL: http://arxiv.org/abs/2408.02654v1
- Date: Mon, 5 Aug 2024 17:33:09 GMT
- Title: On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization
- Authors: Andriy Miranskyy, Adam Sorrenti, Viral Thakar,
- Abstract summary: We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) as a source of randomness for initializers can improve model performance.
Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of randomness. We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) -- namely Sobol' sequences -- as a source of randomness for initializers can improve model performance. We examine Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer architectures trained on MNIST, CIFAR-10, and IMDB datasets using SGD and Adam optimizers. Our analysis uses ten initialization schemes: Glorot, He, Lecun (both Uniform and Normal); Orthogonal, Random Normal, Truncated Normal, and Random Uniform. Models with weights set using PRNG- and QRNG-based initializers are compared pairwise for each combination of dataset, architecture, optimizer, and initialization scheme. Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers in 60% of the 120 experiments conducted. Thus, using QRNG-based initializers instead of PRNG-based initializers can speed up and improve model training.
Related papers
- Transformer models as an efficient replacement for statistical test suites to evaluate the quality of random numbers [0.0]
We present a deep learning model that performs multiple NIST STS tests at once and runs much faster.
This model outputs multi-label classification results on passing these statistical tests.
We also compared this model to a conventional deep learning method to quantify randomness and showed our model achieved similar performances.
arXiv Detail & Related papers (2024-05-06T23:36:03Z) - Universal Neural Functionals [67.80283995795985]
A challenging problem in many modern machine learning tasks is to process weight-space features.
Recent works have developed promising weight-space models that are equivariant to the permutation symmetries of simple feedforward networks.
This work proposes an algorithm that automatically constructs permutation equivariant models for any weight space.
arXiv Detail & Related papers (2024-02-07T20:12:27Z) - Learning to Simulate: Generative Metamodeling via Quantile Regression [2.2518304637809714]
We propose a new metamodeling concept, called generative metamodeling, which aims to construct a "fast simulator of the simulator"
Once constructed, a generative metamodel can generate a large amount of random outputs as soon as the inputs are specified.
We propose a new algorithm -- quantile-regression-based generative metamodeling (QRGMM) -- and study its convergence and rate of convergence.
arXiv Detail & Related papers (2023-11-29T16:46:24Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - Physics-Informed Model-Based Reinforcement Learning [19.01626581411011]
One of the drawbacks of traditional reinforcement learning algorithms is their poor sample efficiency.
We learn a model of the environment, essentially its transition dynamics and reward function, use it to generate imaginary trajectories and backpropagate through them to update the policy.
We show that, in model-based RL, model accuracy mainly matters in environments that are sensitive to initial conditions.
We also show that, in challenging environments, physics-informed model-based RL achieves better average-return than state-of-the-art model-free RL algorithms.
arXiv Detail & Related papers (2022-12-05T11:26:10Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Tensor Networks for Probabilistic Sequence Modeling [7.846449972735859]
We use a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data.
We then introduce a novel generative algorithm giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions.
Experiments on sequence modeling with synthetic and real text data show u-MPS outperforming a variety of baselines.
arXiv Detail & Related papers (2020-03-02T17:16:05Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.