Why the Rich Get Richer? On the Balancedness of Random Partition Models
- URL: http://arxiv.org/abs/2201.12697v1
- Date: Sun, 30 Jan 2022 01:19:41 GMT
- Title: Why the Rich Get Richer? On the Balancedness of Random Partition Models
- Authors: Changwoo J. Lee, Huiyan Sang
- Abstract summary: We study the balancedness of exchangeable random partition models.
We demonstrate that the "rich-get-richer" characteristic of many existing popular random partition models is an inevitable consequence of two common assumptions.
We also introduce the "rich-get-poorer" random partition models and illustrate their application to entity resolution tasks.
- Score: 1.776746672434207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random partition models are widely used in Bayesian methods for various
clustering tasks, such as mixture models, topic models, and community detection
problems. While the number of clusters induced by random partition models has
been studied extensively, another important model property regarding the
balancedness of cluster sizes has been largely neglected. We formulate a
framework to define and theoretically study the balancedness of exchangeable
random partition models, by analyzing how a model assigns probabilities to
partitions with different levels of balancedness. We demonstrate that the
"rich-get-richer" characteristic of many existing popular random partition
models is an inevitable consequence of two common assumptions: product-form
exchangeability and projectivity. We propose a principled way to compare the
balancedness of random partition models, which gives a better understanding of
what model works better and what doesn't for different applications. We also
introduce the "rich-get-poorer" random partition models and illustrate their
application to entity resolution tasks.
Related papers
- An Entropy-Based Test and Development Framework for Uncertainty Modeling in Level-Set Visualizations [2.5449631655313896]
We use an entropy calculation directly on ensemble data to establish an expected result.
We show that fewer bins in nonparametric histogram models are more effective whereas large numbers of bins in quantile models approach data accuracy.
arXiv Detail & Related papers (2024-09-13T00:31:16Z) - Random Models for Fuzzy Clustering Similarity Measures [0.0]
The Adjusted Rand Index (ARI) is a widely used method for comparing hard clusterings.
We propose a single framework for computing the ARI with three random models that are intuitive and explainable for both hard and fuzzy clusterings.
arXiv Detail & Related papers (2023-12-16T00:07:04Z) - Locking and Quacking: Stacking Bayesian model predictions by log-pooling
and superposition [0.5735035463793007]
We present two novel tools for combining predictions from different models.
These are generalisations of model stacking, but combine posterior densities by log-linear pooling and quantum superposition.
To optimise model weights while avoiding the burden of normalising constants, we investigate the Hyvarinen score of the combined posterior predictions.
arXiv Detail & Related papers (2023-05-12T09:26:26Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.