Automatic Parameter Selection for Non-Redundant Clustering
- URL: http://arxiv.org/abs/2312.11952v1
- Date: Tue, 19 Dec 2023 08:53:00 GMT
- Title: Automatic Parameter Selection for Non-Redundant Clustering
- Authors: Collin Leiber and Dominik Mautz and Claudia Plant and Christian B\"ohm
- Abstract summary: High-dimensional datasets often contain multiple meaningful clusterings in different subspaces.
We propose a framework that automatically detects the number of subspaces and clusters per subspace.
We also introduce an encoding strategy that allows us to detect outliers in each subspace.
- Score: 11.68971888446462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional datasets often contain multiple meaningful clusterings in
different subspaces. For example, objects can be clustered either by color,
weight, or size, revealing different interpretations of the given dataset. A
variety of approaches are able to identify such non-redundant clusterings.
However, most of these methods require the user to specify the expected number
of subspaces and clusters for each subspace. Stating these values is a
non-trivial problem and usually requires detailed knowledge of the input
dataset. In this paper, we propose a framework that utilizes the Minimum
Description Length Principle (MDL) to detect the number of subspaces and
clusters per subspace automatically. We describe an efficient procedure that
greedily searches the parameter space by splitting and merging subspaces and
clusters within subspaces. Additionally, an encoding strategy is introduced
that allows us to detect outliers in each subspace. Extensive experiments show
that our approach is highly competitive to state-of-the-art methods.
Related papers
- Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Inv-SENnet: Invariant Self Expression Network for clustering under
biased data [17.25929452126843]
We propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces.
Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-11-13T01:19:06Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Enriched Robust Multi-View Kernel Subspace Clustering [5.770309971945476]
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly.
Most existing methods suffer from two critical issues.
We propose a novel multi-view subspace clustering method.
arXiv Detail & Related papers (2022-05-21T03:06:24Z) - Analysis of Sparse Subspace Clustering: Experiments and Random
Projection [0.0]
Clustering is a technique that is used in many domains, such as face clustering, plant categorization, image segmentation, document classification.
We analyze one of these techniques: a powerful clustering algorithm called Sparse Subspace Clustering.
We demonstrate several experiments using this method and then introduce a new approach that can reduce the computational time required to perform sparse subspace clustering.
arXiv Detail & Related papers (2022-04-01T23:55:53Z) - Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data.
Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z) - A local approach to parameter space reduction for regression and
classification tasks [0.0]
We propose a new method called local active subspaces (LAS), which explores the synergies of active subspaces with supervised clustering techniques.
LAS is particularly useful for the community working on surrogate modelling.
arXiv Detail & Related papers (2021-07-22T18:06:04Z) - Overcomplete Deep Subspace Clustering Networks [80.16644725886968]
Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error.
Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise.
arXiv Detail & Related papers (2020-11-16T22:07:18Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.