Repulsive Mixture Models of Exponential Family PCA for Clustering
- URL: http://arxiv.org/abs/2004.03112v1
- Date: Tue, 7 Apr 2020 04:07:29 GMT
- Title: Repulsive Mixture Models of Exponential Family PCA for Clustering
- Authors: Maoying Qiao, Tongliang Liu, Jun Yu, Wei Bian, Dacheng Tao
- Abstract summary: The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
- Score: 127.90219303669006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mixture extension of exponential family principal component analysis
(EPCA) was designed to encode much more structural information about data
distribution than the traditional EPCA does. For example, due to the linearity
of EPCA's essential form, nonlinear cluster structures cannot be easily
handled, but they are explicitly modeled by the mixing extensions. However, the
traditional mixture of local EPCAs has the problem of model redundancy, i.e.,
overlaps among mixing components, which may cause ambiguity for data
clustering. To alleviate this problem, in this paper, a
repulsiveness-encouraging prior is introduced among mixing components and a
diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
Specifically, a determinantal point process (DPP) is exploited as a
diversity-encouraging prior distribution over the joint local EPCAs. As
required, a matrix-valued measure for L-ensemble kernel is designed, within
which, $\ell_1$ constraints are imposed to facilitate selecting effective PCs
of local EPCAs, and angular based similarity measure are proposed. An efficient
variational EM algorithm is derived to perform parameter learning and hidden
variable inference. Experimental results on both synthetic and real-world
datasets confirm the effectiveness of the proposed method in terms of model
parsimony and generalization ability on unseen test data.
Related papers
- Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Geodesic Optimization for Predictive Shift Adaptation on EEG data [53.58711912565724]
Domain adaptation methods struggle when distribution shifts occur simultaneously in $X$ and $y$.
This paper proposes a novel method termed Geodesic Optimization for Predictive Shift Adaptation (GOPSA) to address test-time multi-source DA.
GOPSA has the potential to combine the advantages of mixed-effects modeling with machine learning for biomedical applications of EEG.
arXiv Detail & Related papers (2024-07-04T12:15:42Z) - Coupled generator decomposition for fusion of electro- and magnetoencephalography data [1.7102695043811291]
Data fusion modeling can identify common features across diverse data sources while accounting for source-specific variability.
We introduce the concept of a textitcoupled generator decomposition and demonstrate how it generalizes sparse principal component analysis for data fusion.
arXiv Detail & Related papers (2024-03-02T12:09:16Z) - Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases.
Grouping variables statistically via clustering or some prior knowledge gains some power back.
We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z) - Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous
Data [16.153709556346417]
Clustering is a widely deployed learning tool.
iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Shared Independent Component Analysis for Multi-Subject Neuroimaging [107.29179765643042]
We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise.
We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances.
We provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives.
arXiv Detail & Related papers (2021-10-26T08:54:41Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z) - Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension
reduction & clustering [9.042239247913642]
This article focuses on improving upon PCA and k-means, by allowing nonlinear relations in the data and more flexible cluster shapes.
The key contribution is a new framework for Principal Analysis (PEA), defining a simple and computationally efficient alternative to PCA.
In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
arXiv Detail & Related papers (2020-08-17T06:25:50Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.