Related papers: Repulsive Mixture Models of Exponential Family PCA for Clustering

Repulsive Mixture Models of Exponential Family PCA for Clustering

URL: http://arxiv.org/abs/2004.03112v1
Date: Tue, 7 Apr 2020 04:07:29 GMT
Title: Repulsive Mixture Models of Exponential Family PCA for Clustering
Authors: Maoying Qiao, Tongliang Liu, Jun Yu, Wei Bian, Dacheng Tao
Abstract summary: The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA. The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering. In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
Score: 127.90219303669006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The mixture extension of exponential family principal component analysis (EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA does. For example, due to the linearity of EPCA's essential form, nonlinear cluster structures cannot be easily handled, but they are explicitly modeled by the mixing extensions. However, the traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering. To alleviate this problem, in this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework. Specifically, a determinantal point process (DPP) is exploited as a diversity-encouraging prior distribution over the joint local EPCAs. As required, a matrix-valued measure for L-ensemble kernel is designed, within which, $\ell_1$ constraints are imposed to facilitate selecting effective PCs of local EPCAs, and angular based similarity measure are proposed. An efficient variational EM algorithm is derived to perform parameter learning and hidden variable inference. Experimental results on both synthetic and real-world datasets confirm the effectiveness of the proposed method in terms of model parsimony and generalization ability on unseen test data.

Related papers

Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models [3.0040661953201475]
This paper studies the task of estimating heterogeneous treatment effects in causal panel data models.<n>We propose a novel CoAdjusted Deep Causal Learning (Co) for panel data models, that employs flexible model structures and powerful neural network architectures.
arXiv Detail & Related papers (2025-05-26T21:45:43Z)
Exponential Convergence of CAVI for Bayesian PCA [0.7929564340244416]
Probabilistic principal component analysis (PCA) and its Bayesian variant (BPCA) are widely used for dimension reduction in machine learning and statistics.<n>In our paper, we prove a precise exponential convergence result in the case where the model uses a single principal component (PC)<n>We also leverage recent tools to prove exponential convergence of CAVI for the model with any number of PCs.
arXiv Detail & Related papers (2025-05-22T02:44:00Z)
A Hybrid Mixture of $t$-Factor Analyzers for Clustering High-dimensional Data [0.07673339435080444]
This paper develops a novel hybrid approach for estimating the mixture model of $t$-factor analyzers (MtFA) The effectiveness of our approach is demonstrated through simulations showcasing its superior computational efficiency compared to the existing method. Our method is applied to cluster the Gamma-ray bursts, reinforcing several claims in the literature that Gamma-ray bursts have heterogeneous subpopulations and providing characterizations of the estimated groups.
arXiv Detail & Related papers (2025-04-29T18:59:58Z)
Copula-based mixture model identification for subgroup clustering with imaging applications [2.285847431713438]
We consider the more flexible Copula-Based Mixture Models (CBMMs) for clustering. CBMMs allow heterogeneous component distributions composed by flexible choices of marginal and copula forms.
arXiv Detail & Related papers (2025-02-12T16:30:39Z)
Amortized Bayesian Mixture Models [1.3976439685325095]
This paper introduces a novel extension of Amortized Bayesian Inference (ABI) tailored to mixture models. We factorize the posterior into a distribution of the parameters and a distribution of (categorical) mixture indicators, which allows us to use a combination of generative neural networks. The proposed framework accommodates both independent and dependent mixture models, enabling filtering and smoothing.
arXiv Detail & Related papers (2025-01-17T14:51:03Z)
Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z)
Geodesic Optimization for Predictive Shift Adaptation on EEG data [53.58711912565724]
Domain adaptation methods struggle when distribution shifts occur simultaneously in $X$ and $y$. This paper proposes a novel method termed Geodesic Optimization for Predictive Shift Adaptation (GOPSA) to address test-time multi-source DA. GOPSA has the potential to combine the advantages of mixed-effects modeling with machine learning for biomedical applications of EEG.
arXiv Detail & Related papers (2024-07-04T12:15:42Z)
Coupled generator decomposition for fusion of electro- and magnetoencephalography data [1.7102695043811291]
Data fusion modeling can identify common features across diverse data sources while accounting for source-specific variability. We introduce the concept of a textitcoupled generator decomposition and demonstrate how it generalizes sparse principal component analysis for data fusion.
arXiv Detail & Related papers (2024-03-02T12:09:16Z)
Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases. Grouping variables statistically via clustering or some prior knowledge gains some power back. We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z)
Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data [16.153709556346417]
Clustering is a widely deployed learning tool. iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z)
Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models. PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z)
Shared Independent Component Analysis for Multi-Subject Neuroimaging [107.29179765643042]
We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise. We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances. We provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives.
arXiv Detail & Related papers (2021-10-26T08:54:41Z)
Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system. In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX) The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z)
Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering [9.042239247913642]
This article focuses on improving upon PCA and k-means, by allowing nonlinear relations in the data and more flexible cluster shapes. The key contribution is a new framework for Principal Analysis (PEA), defining a simple and computationally efficient alternative to PCA. In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
arXiv Detail & Related papers (2020-08-17T06:25:50Z)
Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.