Mean-field limit from general mixtures of experts to quantum neural networks
- URL: http://arxiv.org/abs/2501.14660v1
- Date: Fri, 24 Jan 2025 17:29:41 GMT
- Title: Mean-field limit from general mixtures of experts to quantum neural networks
- Authors: Anderson Melchor Hernandez, Davide Pastorello, Giacomo De Palma,
- Abstract summary: We study the behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems.
Our main result establishes the propagation of chaos for a MoE as the number of experts diverges.
- Score: 3.7498611358320733
- License:
- Abstract: In this work, we study the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems. Our main result establishes the propagation of chaos for a MoE as the number of experts diverges. We demonstrate that the corresponding empirical measure of their parameters is close to a probability measure that solves a nonlinear continuity equation, and we provide an explicit convergence rate that depends solely on the number of experts. We apply our results to a MoE generated by a quantum neural network.
Related papers
- Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study [65.11303133775857]
Mixture-of-Experts (MoE) computation amalgamates predictions from several specialized sub-models (referred to as experts)
Sparse MoE selectively engages only a limited number, or even just one expert, significantly reducing overhead while empirically preserving, and sometimes even enhancing, performance.
arXiv Detail & Related papers (2024-03-26T05:48:02Z) - On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts? [27.924615931679757]
We explore the impacts of a dense-to-sparse gating mixture of experts (MoE) on the maximum likelihood estimation under the MoE.
We propose using a novel activation dense-to-sparse gate, which routes the output of a linear layer to an activation function before delivering them to the softmax function.
arXiv Detail & Related papers (2024-01-25T01:09:09Z) - Theory of non-Hermitian fermionic superfluidity on a honeycomb lattice:
Interplay between exceptional manifolds and van Hove Singularity [0.0]
We study the non-Hermitian fermionic superfluidity subject to dissipation of Cooper pairs on a honeycomb lattice.
We demonstrate the emergence of the dissipation-induced superfluid phase that is anomalously enlarged by a cusp on the phase boundary.
arXiv Detail & Related papers (2023-09-28T06:21:55Z) - Deep Gaussian Mixture Ensembles [9.673093148930874]
This work introduces a novel probabilistic deep learning technique called deep Gaussian mixture ensembles (DGMEs)
DGMEs are capable of approximating complex probability distributions, such as heavy-tailed or multimodal distributions.
Our experimental results demonstrate that DGMEs outperform state-of-the-art uncertainty quantifying deep learning models in handling complex predictive densities.
arXiv Detail & Related papers (2023-06-12T16:53:38Z) - Towards Convergence Rates for Parameter Estimation in Gaussian-gated
Mixture of Experts [40.24720443257405]
We provide a convergence analysis for maximum likelihood estimation (MLE) in the Gaussian-gated MoE model.
Our findings reveal that the MLE has distinct behaviors under two complement settings of location parameters of the Gaussian gating functions.
Notably, these behaviors can be characterized by the solvability of two different systems of equations.
arXiv Detail & Related papers (2023-05-12T16:02:19Z) - Demystifying Softmax Gating Function in Gaussian Mixture of Experts [34.53974702114644]
We propose novel Voronoi loss functions among parameters and establish the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation.
Our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of equations.
arXiv Detail & Related papers (2023-05-05T05:37:55Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated.
However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Bayesian Uncertainty Estimation of Learned Variational MRI
Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty.
We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.