Fast Deep Mixtures of Gaussian Process Experts
- URL: http://arxiv.org/abs/2006.13309v4
- Date: Fri, 1 Dec 2023 01:03:08 GMT
- Title: Fast Deep Mixtures of Gaussian Process Experts
- Authors: Clement Etienam, Kody Law, Sara Wade, Vitaly Zankin
- Abstract summary: Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context.
In this article, we propose to design the gating network for selecting the experts from sparse GPs using a deep neural network (DNN)
A fast one pass algorithm called Cluster-Classify-Regress ( CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly.
- Score: 0.6554326244334868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixtures of experts have become an indispensable tool for flexible modelling
in a supervised learning context, allowing not only the mean function but the
entire density of the output to change with the inputs. Sparse Gaussian
processes (GP) have shown promise as a leading candidate for the experts in
such models, and in this article, we propose to design the gating network for
selecting the experts from such mixtures of sparse GPs using a deep neural
network (DNN). Furthermore, a fast one pass algorithm called
Cluster-Classify-Regress (CCR) is leveraged to approximate the maximum a
posteriori (MAP) estimator extremely quickly. This powerful combination of
model and algorithm together delivers a novel method which is flexible, robust,
and extremely efficient. In particular, the method is able to outperform
competing methods in terms of accuracy and uncertainty quantification. The cost
is competitive on low-dimensional and small data sets, but is significantly
lower for higher-dimensional and big data sets. Iteratively maximizing the
distribution of experts given allocations and allocations given experts does
not provide significant improvement, which indicates that the algorithm
achieves a good approximation to the local MAP estimator very fast. This
insight can be useful also in the context of other mixture of experts models.
Related papers
- On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Generalizable Mixed-Precision Quantization via Attribution Rank
Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference.
Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z) - Mixture of ELM based experts with trainable gating network [2.320417845168326]
We propose an ensemble learning method based on mixture of experts.
The structure of ME consists of multi layer perceptrons (MLPs) as base experts and gating network.
In the proposed method a trainable gating network is applied to aggregate the outputs of the experts.
arXiv Detail & Related papers (2021-05-25T07:13:35Z) - Gaussian Experts Selection using Graphical Models [7.530615321587948]
Local approximations reduce time complexity by dividing the original dataset into subsets and training a local expert on each subset.
We leverage techniques from the literature on undirected graphical models, using sparse precision matrices that encode conditional dependencies between experts to select the most important experts.
arXiv Detail & Related papers (2021-02-02T14:12:11Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Aggregating Dependent Gaussian Experts in Local Approximation [8.4159776055506]
We propose a novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence.
The dependency between experts is determined by using a Gaussian graphical model, which yields the precision matrix.
Our new method outperforms other state-of-the-art (SOTA) DGP approaches while being substantially more time-efficient than SOTA approaches.
arXiv Detail & Related papers (2020-10-17T21:49:43Z) - Compressive MR Fingerprinting reconstruction with Neural Proximal
Gradient iterations [27.259916894535404]
ProxNet is a learned proximal gradient descent framework that incorporates the forward acquisition and Bloch dynamic models within a recurrent learning mechanism.
Our numerical experiments show that the ProxNet can achieve a superior quantitative inference accuracy, much smaller storage requirement, and a comparable runtime to the recent deep learning MRF baselines.
arXiv Detail & Related papers (2020-06-27T03:52:22Z) - Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the
Predictive Uncertainties [12.068153197381575]
We propose a novel variational family that allows for retaining covariances between latent processes while achieving fast convergence.
We provide an efficient implementation of our new approach and apply it to several benchmark datasets.
It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives.
arXiv Detail & Related papers (2020-05-22T11:10:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.