repliclust: Synthetic Data for Cluster Analysis
- URL: http://arxiv.org/abs/2303.14301v1
- Date: Fri, 24 Mar 2023 23:45:27 GMT
- Title: repliclust: Synthetic Data for Cluster Analysis
- Authors: Michael J. Zellinger and Peter B\"uhlmann
- Abstract summary: repliclust is a Python package for generating synthetic data sets with clusters.
The project webpage, repliclust.org, provides a concise user guide and thorough documentation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present repliclust (from repli-cate and clust-er), a Python package for
generating synthetic data sets with clusters. Our approach is based on data set
archetypes, high-level geometric descriptions from which the user can create
many different data sets, each possessing the desired geometric
characteristics. The architecture of our software is modular and
object-oriented, decomposing data generation into algorithms for placing
cluster centers, sampling cluster shapes, selecting the number of data points
for each cluster, and assigning probability distributions to clusters. The
project webpage, repliclust.org, provides a concise user guide and thorough
documentation.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Interpretable Deep Clustering for Tabular Data [7.972599673048582]
Clustering is a fundamental learning task widely used in data analysis.
We propose a new deep-learning framework that predicts interpretable cluster assignments at the instance and cluster levels.
We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics datasets.
arXiv Detail & Related papers (2023-06-07T21:08:09Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Generating Multidimensional Clusters With Support Lines [0.0]
We present Clugen, a modular procedure for synthetic data generation.
Cluken is open source, comprehensively unit tested and documented.
We demonstrate that Clugen is fit for use in the assessment of clustering algorithms.
arXiv Detail & Related papers (2023-01-24T22:08:24Z) - Cluster Explanation via Polyhedral Descriptions [0.0]
Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features.
Traditional clustering algorithms provide limited insight into the groups they find as their main focus is accuracy and not the interpretability of the group assignments.
We introduce a new approach to explain clusters by constructing polyhedra around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description.
arXiv Detail & Related papers (2022-10-17T07:26:44Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - A framework for benchmarking clustering algorithms [2.900810893770134]
Clustering algorithms can be tested on a variety of benchmark problems.
Many research papers and graduate theses consider only a small number of datasets.
We have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms.
arXiv Detail & Related papers (2022-09-20T06:10:41Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data.
Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.