Gaussian Experts Selection using Graphical Models
- URL: http://arxiv.org/abs/2102.01496v2
- Date: Thu, 4 Feb 2021 11:55:06 GMT
- Title: Gaussian Experts Selection using Graphical Models
- Authors: Hamed Jalali, Martin Pawelczyk, Gjergji Kasneci
- Abstract summary: Local approximations reduce time complexity by dividing the original dataset into subsets and training a local expert on each subset.
We leverage techniques from the literature on undirected graphical models, using sparse precision matrices that encode conditional dependencies between experts to select the most important experts.
- Score: 7.530615321587948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Local approximations are popular methods to scale Gaussian processes (GPs) to
big data. Local approximations reduce time complexity by dividing the original
dataset into subsets and training a local expert on each subset. Aggregating
the experts' prediction is done assuming either conditional dependence or
independence between the experts. Imposing the \emph{conditional independence
assumption} (CI) between the experts renders the aggregation of different
expert predictions time efficient at the cost of poor uncertainty
quantification. On the other hand, modeling dependent experts can provide
precise predictions and uncertainty quantification at the expense of
impractically high computational costs. By eliminating weak experts via a
theory-guided expert selection step, we substantially reduce the computational
cost of aggregating dependent experts while ensuring calibrated uncertainty
quantification. We leverage techniques from the literature on undirected
graphical models, using sparse precision matrices that encode conditional
dependencies between experts to select the most important experts. Moreov
Related papers
- Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts.
We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z) - Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts [78.3687645289918]
We show that the sigmoid gating function enjoys a higher sample efficiency than the softmax gating for the statistical task of expert estimation.
We find that experts formulated as feed-forward networks with commonly used activation such as ReLU and GELU enjoy faster convergence rates under the sigmoid gating.
arXiv Detail & Related papers (2024-05-22T21:12:34Z) - Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study [65.11303133775857]
Mixture-of-Experts (MoE) computation amalgamates predictions from several specialized sub-models (referred to as experts)
Sparse MoE selectively engages only a limited number, or even just one expert, significantly reducing overhead while empirically preserving, and sometimes even enhancing, performance.
arXiv Detail & Related papers (2024-03-26T05:48:02Z) - On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - Entry Dependent Expert Selection in Distributed Gaussian Processes Using
Multilabel Classification [12.622412402489951]
An ensemble technique combines local predictions from Gaussian experts trained on different partitions of the data.
This paper proposes a flexible expert selection approach based on the characteristics of entry data points.
arXiv Detail & Related papers (2022-11-17T23:23:26Z) - MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.
MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated.
However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z) - Correlated Product of Experts for Sparse Gaussian Process Regression [2.466065249430993]
We propose a new approach based on aggregating predictions from several local and correlated experts.
Our method recovers independent Product of Experts, sparse GP and full GP in the limiting cases.
We demonstrate superior performance, in a time vs. accuracy sense, of our proposed method against state-of-the-art GP approximation methods.
arXiv Detail & Related papers (2021-12-17T14:14:08Z) - Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency.
Several techniques were proposed to improve over ST while keeping the same low computational complexity.
We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z) - Healing Products of Gaussian Processes [21.892542043785845]
We propose a new product-of-expert model that combines predictions of local experts by computing their Wasserstein barycenter.
In particular, we propose a new product-of-expert model that combines predictions of local experts by computing their Wasserstein barycenter.
arXiv Detail & Related papers (2021-02-14T08:53:43Z) - Aggregating Dependent Gaussian Experts in Local Approximation [8.4159776055506]
We propose a novel approach for aggregating the Gaussian experts by detecting strong violations of conditional independence.
The dependency between experts is determined by using a Gaussian graphical model, which yields the precision matrix.
Our new method outperforms other state-of-the-art (SOTA) DGP approaches while being substantially more time-efficient than SOTA approaches.
arXiv Detail & Related papers (2020-10-17T21:49:43Z) - Fast Deep Mixtures of Gaussian Process Experts [0.6554326244334868]
Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context.
In this article, we propose to design the gating network for selecting the experts from sparse GPs using a deep neural network (DNN)
A fast one pass algorithm called Cluster-Classify-Regress ( CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly.
arXiv Detail & Related papers (2020-06-11T18:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.