Blocked Clusterwise Regression
- URL: http://arxiv.org/abs/2001.11130v1
- Date: Wed, 29 Jan 2020 23:29:31 GMT
- Title: Blocked Clusterwise Regression
- Authors: Max Cytrynbaum
- Abstract summary: We generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple latent variables.
We contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent literature in econometrics models unobserved cross-sectional
heterogeneity in panel data by assigning each cross-sectional unit a
one-dimensional, discrete latent type. Such models have been shown to allow
estimation and inference by regression clustering methods. This paper is
motivated by the finding that the clustered heterogeneity models studied in
this literature can be badly misspecified, even when the panel has significant
discrete cross-sectional structure. To address this issue, we generalize
previous approaches to discrete unobserved heterogeneity by allowing each unit
to have multiple, imperfectly-correlated latent variables that describe its
response-type to different covariates. We give inference results for a k-means
style estimator of our model and develop information criteria to jointly select
the number clusters for each latent variable. Monte Carlo simulations confirm
our theoretical results and give intuition about the finite-sample performance
of estimation and model selection. We also contribute to the theory of
clustering with an over-specified number of clusters and derive new convergence
rates for this setting. Our results suggest that over-fitting can be severe in
k-means style estimators when the number of clusters is over-specified.
Related papers
- Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for
Clustering Count Data [0.8499685241219366]
A class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced.
The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies.
arXiv Detail & Related papers (2023-11-13T21:23:15Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - High-dimensional variable clustering based on maxima of a weakly dependent random process [1.1999555634662633]
We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models.
This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference.
We also present an algorithm depending on a tuning parameter that recovers the clusters of variables without specifying the number of clusters empha priori.
arXiv Detail & Related papers (2023-02-02T08:24:26Z) - clusterBMA: Bayesian model averaging for clustering [1.2021605201770345]
We introduce clusterBMA, a method that enables weighted model averaging across results from unsupervised clustering algorithms.
We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model.
In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters.
arXiv Detail & Related papers (2022-09-09T04:55:20Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Multi-objective Semi-supervised Clustering for Finding Predictive
Clusters [0.5371337604556311]
This study focuses on clustering problems and aims to find compact clusters that are informative regarding the outcome variable.
The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously.
arXiv Detail & Related papers (2022-01-26T06:24:38Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Vine copula mixture models and clustering for non-Gaussian data [0.0]
We propose a novel vine copula mixture model for continuous data.
We show that the model-based clustering algorithm with vine copula mixture models outperforms the other model-based clustering techniques.
arXiv Detail & Related papers (2021-02-05T16:04:26Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.