Related papers: Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

URL: http://arxiv.org/abs/2007.12786v1
Date: Wed, 8 Jul 2020 10:56:05 GMT
Title: Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data
Authors: Siva Rajesh Kasa, Vaibhav Rajan
Abstract summary: We study two practically important cases of model based clustering using Gaussian Mixture Models. We show that EM has better clustering performance, measured by Adjusted Rand Index, compared to Gradient Descent in cases of misspecification. We propose a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components.
Score: 6.053629733936546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study two practically important cases of model based clustering using Gaussian Mixture Models: (1) when there is misspecification and (2) on high dimensional data, in the light of recent advances in Gradient Descent (GD) based optimization using Automatic Differentiation (AD). Our simulation studies show that EM has better clustering performance, measured by Adjusted Rand Index, compared to GD in cases of misspecification, whereas on high dimensional data GD outperforms EM. We observe that both with EM and GD there are many solutions with high likelihood but poor cluster interpretation. To address this problem we design a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components. Closed form expressions for the gradients of this penalized likelihood are difficult to derive but AD can be done effortlessly, illustrating the advantage of AD-based optimization. Extensions of this penalty for high dimensional data and for model selection are discussed. Numerical experiments on synthetic and real datasets demonstrate the efficacy of clustering using the proposed penalized likelihood approach.

Related papers

Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
Curvature Enhanced Data Augmentation for Regression [4.910937238451485]
We introduce the Curvature-Enhanced Manifold Sampling (CEMS) method for regression tasks.<n>CEMS delivers superior performance in both in-distribution and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-06-07T16:18:37Z)
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs. We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z)
Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC) LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses. LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z)
A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data [0.0]
We introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model. The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.
arXiv Detail & Related papers (2023-02-06T23:34:51Z)
Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data [16.153709556346417]
Clustering is a widely deployed learning tool. iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z)
Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering [5.819751855626331]
We show how a family of such two-stage models can be combined into a single, hierarchical model that we call a hierarchical mixture of Gaussians (HMoG) An HMoG simultaneously captures both dimensionality-reduction and clustering, and its performance is quantified in closed-form by the likelihood function. We apply HMoGs to synthetic data and RNA sequencing data, and demonstrate how they exceed the limitations of two-stage models.
arXiv Detail & Related papers (2022-06-10T02:03:18Z)
Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions. We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels. The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z)
An Adaptive Alternating-direction-method-based Nonnegative Latent Factor Model [2.857044909410376]
An alternating-direction-method-based nonnegative latent factor model can perform efficient representation learning to a high-dimensional and incomplete (HDI) matrix. This paper proposes an Adaptive Alternating-direction-method-based Nonnegative Latent Factor model, whose hyper- parameter adaptation is implemented following the principle of particle swarm optimization. Empirical studies on nonnegative HDI matrices generated by industrial applications indicate that A2NLF outperforms several state-of-the-art models in terms of computational and storage efficiency, as well as maintains highly competitive estimation accuracy for an HDI matrix's missing data
arXiv Detail & Related papers (2022-04-11T03:04:26Z)
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG) Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data. We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z)
Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.