Related papers: Automatic Parameter Selection for Non-Redundant Clustering

Automatic Parameter Selection for Non-Redundant Clustering

URL: http://arxiv.org/abs/2312.11952v1
Date: Tue, 19 Dec 2023 08:53:00 GMT
Title: Automatic Parameter Selection for Non-Redundant Clustering
Authors: Collin Leiber and Dominik Mautz and Claudia Plant and Christian B\"ohm
Abstract summary: High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. We propose a framework that automatically detects the number of subspaces and clusters per subspace. We also introduce an encoding strategy that allows us to detect outliers in each subspace.
Score: 11.68971888446462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. For example, objects can be clustered either by color, weight, or size, revealing different interpretations of the given dataset. A variety of approaches are able to identify such non-redundant clusterings. However, most of these methods require the user to specify the expected number of subspaces and clusters for each subspace. Stating these values is a non-trivial problem and usually requires detailed knowledge of the input dataset. In this paper, we propose a framework that utilizes the Minimum Description Length Principle (MDL) to detect the number of subspaces and clusters per subspace automatically. We describe an efficient procedure that greedily searches the parameter space by splitting and merging subspaces and clusters within subspaces. Additionally, an encoding strategy is introduced that allows us to detect outliers in each subspace. Extensive experiments show that our approach is highly competitive to state-of-the-art methods.

Related papers

Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z)
Depth-Based Local Center Clustering: A Framework for Handling Different Clustering Scenarios [46.164361878412656]
Cluster analysis plays a crucial role across numerous scientific and engineering domains.<n>Despite the wealth of clustering methods proposed over the past decades, each method is typically designed for specific scenarios.<n>In this paper, we propose depth-based clustering (DLCC)<n>DLCC makes use of a local version of data depth that is based on subsets of data
arXiv Detail & Related papers (2025-05-14T16:08:11Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Hierarchical clustering with maximum density paths and mixture models [39.42511559155036]
Hierarchical clustering is an effective and interpretable technique for analyzing structure in data. It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets. Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape. This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis.
arXiv Detail & Related papers (2025-03-19T15:37:51Z)
Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space. Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Inv-SENnet: Invariant Self Expression Network for clustering under biased data [17.25929452126843]
We propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces. Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-11-13T01:19:06Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments. One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z)
Enriched Robust Multi-View Kernel Subspace Clustering [5.770309971945476]
Subspace clustering is to find underlying low-dimensional subspaces and cluster the data points correctly. Most existing methods suffer from two critical issues. We propose a novel multi-view subspace clustering method.
arXiv Detail & Related papers (2022-05-21T03:06:24Z)
Analysis of Sparse Subspace Clustering: Experiments and Random Projection [0.0]
Clustering is a technique that is used in many domains, such as face clustering, plant categorization, image segmentation, document classification. We analyze one of these techniques: a powerful clustering algorithm called Sparse Subspace Clustering. We demonstrate several experiments using this method and then introduce a new approach that can reduce the computational time required to perform sparse subspace clustering.
arXiv Detail & Related papers (2022-04-01T23:55:53Z)
Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z)
A local approach to parameter space reduction for regression and classification tasks [0.0]
We propose a new method called local active subspaces (LAS), which explores the synergies of active subspaces with supervised clustering techniques. LAS is particularly useful for the community working on surrogate modelling.
arXiv Detail & Related papers (2021-07-22T18:06:04Z)
Overcomplete Deep Subspace Clustering Networks [80.16644725886968]
Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error. Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise.
arXiv Detail & Related papers (2020-11-16T22:07:18Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.