Covariate-assisted Grade of Membership Models via Shared Latent Geometry
- URL: http://arxiv.org/abs/2601.17265v1
- Date: Sat, 24 Jan 2026 02:30:36 GMT
- Title: Covariate-assisted Grade of Membership Models via Shared Latent Geometry
- Authors: Zhiyu Xu, Yuqi Gu,
- Abstract summary: The grade of membership model is a flexible latent variable model for analyzing multivariate categorical data through individual-level mixed membership scores.<n>Traditional approaches to incorporating auxiliary covariates typically rely on fully specified joint likelihoods, which are computationally intensive and sensitive to misspecification.<n>We introduce a covariate-assisted grade of membership model that integrates response and covariate information by exploiting their shared low-rank simplex geometry.
- Score: 0.7939348535496568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The grade of membership model is a flexible latent variable model for analyzing multivariate categorical data through individual-level mixed membership scores. In many modern applications, auxiliary covariates are collected alongside responses and encode information about the same latent structure. Traditional approaches to incorporating such covariates typically rely on fully specified joint likelihoods, which are computationally intensive and sensitive to misspecification. We introduce a covariate-assisted grade of membership model that integrates response and covariate information by exploiting their shared low-rank simplex geometry, rather than modeling their joint distribution. We propose a likelihood-free spectral estimation procedure that combines heterogeneous data sources through a balance parameter controlling their relative contribution. To accommodate high-dimensional and heteroskedastic noise, we employ heteroskedastic principal component analysis before performing simplex-based geometric recovery. Our theoretical analysis establishes weaker identifiability conditions than those required in the covariate-free model, and further derives finite-sample, entrywise error bounds for both mixed membership scores and item parameters. These results demonstrate that auxiliary covariates can provably improve latent structure recovery, yielding faster convergence rates in high-dimensional regimes. Simulation studies and an application to educational assessment data illustrate the computational efficiency, statistical accuracy, and interpretability gains of the proposed method. The code for reproducing these results is open-source and available at \texttt{https://github.com/Toby-X/Covariate-Assisted-GoM}
Related papers
- Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares [1.9580473532948397]
We propose a novel nonparametric regression method that models complex input-relationships as the sum of convex and concave components.<n>The method-ICCNLS-decomposes sub-constrained shape-constrained additive decomposition.
arXiv Detail & Related papers (2025-06-22T15:53:12Z) - Testing Hypotheses of Covariate Effects on Topics of Discourse [0.0]
We introduce an approach to topic modelling that remains tractable in the face of large text corpora.<n>This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model.<n>We argue that the simple, non-parametric approach advocated here is faster, more interpretable, and enjoys better inferential justification than said generative models.
arXiv Detail & Related papers (2025-06-05T20:28:49Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Combining propensity score methods with variational autoencoders for
generating synthetic data in presence of latent sub-groups [0.0]
Heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and reflected only in properties of distributions, such as bimodality or skewness.
We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique.
arXiv Detail & Related papers (2023-12-12T22:49:24Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Strong identifiability and parameter learning in regression with
heterogeneous response [5.503319042839695]
We investigate conditions of strong identifiability, rates of convergence for conditional density and parameter estimation, and the Bayesian posterior contraction behavior arising in finite mixture of regression models.
We provide simulation studies and data illustrations, which shed some light on the parameter learning behavior found in several popular regression mixture models reported in the literature.
arXiv Detail & Related papers (2022-12-08T05:58:13Z) - A new LDA formulation with covariates [3.1690891866882236]
The Latent Dirichlet Allocation model is a popular method for creating mixed-membership clusters.
We propose a new formulation for the LDA model which incorporates covariates.
We use slice sampling within a Gibbs sampling algorithm to estimate model parameters.
The model is illustrated using real data sets from three different areas: text-mining of Coronavirus articles, analysis of grocery shopping baskets, and ecology of tree species on Barro Colorado Island (Panama)
arXiv Detail & Related papers (2022-02-18T19:58:24Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.