Related papers: Varying-Coefficient Mixture of Experts Model

Varying-Coefficient Mixture of Experts Model

URL: http://arxiv.org/abs/2601.01699v1
Date: Mon, 05 Jan 2026 00:23:30 GMT
Title: Varying-Coefficient Mixture of Experts Model
Authors: Qicheng Zhao, Celia M. T. Greenwood, Qihuang Zhang,
Abstract summary: We propose a Varying-Coefficient Mixture of Experts (VCMoE) model that allows all coefficient effects in both the gating functions and expert models to vary along an indexing variable.<n>We illustrate the proposed VCMoE model using a dataset of single gene expression in embryonic mice to characterize the temporal dynamics of the associations between the expression levels of genes Satb2 and Bcl11b.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) is a flexible framework that combines multiple specialized submodels (``experts''), by assigning covariate-dependent weights (``gating functions'') to each expert, and have been commonly used for analyzing heterogeneous data. Existing statistical MoE formulations typically assume constant coefficients, for covariate effects within the expert or gating models, which can be inadequate for longitudinal, spatial, or other dynamic settings where covariate influences and latent subpopulation structure evolve across a known dimension. We propose a Varying-Coefficient Mixture of Experts (VCMoE) model that allows all coefficient effects in both the gating functions and expert models to vary along an indexing variable. We establish identifiability and consistency of the proposed model, and develop an estimation procedure, label-consistent EM algorithm, for both fully functional and hybrid specifications, along with the corresponding asymptotic distributions of the resulting estimators. For inference, simultaneous confidence bands are constructed using both asymptotic theory for the maximum discrepancy between the estimated functional coefficients and their true counterparts, and with bootstrap methods. In addition, a generalized likelihood ratio test is developed to examine whether a coefficient function is genuinely varying across the index variable. Simulation studies demonstrate good finite-sample performance, with acceptable bias and satisfactory coverage rates. We illustrate the proposed VCMoE model using a dataset of single nucleus gene expression in embryonic mice to characterize the temporal dynamics of the associations between the expression levels of genes Satb2 and Bcl11b across two latent cell subpopulations of neurons, yielding results that are consistent with prior findings.

Related papers

Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening [0.8594140167290097]
We introduce a comprehensive Bayesian framework for analyzing heterogeneous covariance data through both classical mixture models and a novel mixture-of-experts Wishart model.<n>We develop an efficient Gibbs-within-Metropolis-Hastings sampling algorithm tailored to the geometry of the Wishart likelihood and the gating network.<n>We present an innovative application of our methodology to a challenging dataset: cancer drug sensitivity profiles.
arXiv Detail & Related papers (2026-02-14T21:07:14Z)
Denoising diffusion networks for normative modeling in neuroimaging [1.0195618602298684]
Most neuroimaging pipelines fit one model per imaging-derived phenotype (IDP)<n>We propose denoising diffusion probabilistic models (DDPMs) as a unified conditional density estimator for IDPs.<n>We evaluate on a synthetic benchmark with heteroscedastic and multimodal age effects and on UK Biobank FreeSurfer phenotypes, scaling from dimension of 2 to 200.
arXiv Detail & Related papers (2026-01-24T06:19:10Z)
Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z)
Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps [41.371172458797524]
Non-identifiability of gating parameters up to common translations, intrinsic gate-expert interactions, and tight numerator-denominator coupling are addressed.<n>For model selection, we adapt dendrogram-guided SGMoE, yielding a consistent, sweep-free selector of the number of experts that attains optimal parameter rates.<n>On a dataset of drought-identifiable maize traits, our dendrogram-guided SGMoE selects two experts, exposes a clear mixing hierarchy, stabilizes the likelihood early, and yields interpretable genotype-phenotype maps.
arXiv Detail & Related papers (2025-10-14T17:23:44Z)
H-AddiVortes: Heteroscedastic (Bayesian) Additive Voronoi Tessellations [0.0]
The Heteroscedastic AddiVortes model simultaneously models the conditional mean and variance of a response variable.<n>By employing a sum-of-tessellations approach for the mean and a product-of-tessellations approach for the variance, the model provides a flexible and interpretable means to capture complex, predictor-dependent relationships.
arXiv Detail & Related papers (2025-03-17T10:41:31Z)
Statistical ranking with dynamic covariates [6.729750785106628]
This paper introduces a general co-assisted statistical ranking model within the Plackett--Luce framework.<n>We develop an efficient alternating algorithm to compute the maximum likelihood estimation (MLE)<n> Numerical studies are conducted to demonstrate the model's application to real-world datasets, including horse racing and tennis competitions.
arXiv Detail & Related papers (2024-06-24T10:26:05Z)
On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Adaptive LASSO estimation for functional hidden dynamic geostatistical model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD) The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z)
Optimal regularizations for data generation with probabilistic graphical models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models. We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z)
Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models. PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z)
Estimation of Bivariate Structural Causal Models by Variational Gaussian Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models. One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts [2.794896499906838]
We consider the class of softmax-gated Gaussian MoE (SGMoE) models with softmax gating functions and Gaussian experts. To the best of our knowledge, we are the first to investigate the $l_1$-regularization properties of SGMoE models from a non-asymptotic perspective. We provide a lower bound on the regularization parameter of the Lasso penalty that ensures non-asymptotic theoretical control of the Kullback--Leibler loss of the Lasso estimator for SGMoE models.
arXiv Detail & Related papers (2020-09-22T15:23:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.