Sparse clustering via the Deterministic Information Bottleneck algorithm
- URL: http://arxiv.org/abs/2601.20628v1
- Date: Wed, 28 Jan 2026 14:05:44 GMT
- Title: Sparse clustering via the Deterministic Information Bottleneck algorithm
- Authors: Efthymios Costa, Ioanna Papatsouma, Angelos Markos,
- Abstract summary: When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges.<n>We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.
Related papers
- PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering [0.0]
This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction.<n>Our approach formulates customizable logical rules and adjustable hyper parameters that allow for user-defined hierarchical cluster construction.<n> Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability.
arXiv Detail & Related papers (2025-11-27T08:20:22Z) - Mixed Data Clustering Survey and Challenges [0.0]
This paper introduces a clustering method grounded in pretopological spaces.<n> benchmarking against classical numerical clustering algorithms yields insights into the performance and effectiveness of the proposed method.
arXiv Detail & Related papers (2025-11-27T08:20:05Z) - Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning [3.4530027457862005]
We introduce a novel clustering approach based on meta-learning.<n>We employ a pre-trained Prior-Data Fitted Transformer Network (PFN) to perform clustering.<n>We show that our approach is superior to the state-of-the-art clustering techniques.
arXiv Detail & Related papers (2025-07-27T17:53:19Z) - A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z) - A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data [0.0]
We present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables.<n>The proposed approach extends the Information Bottleneck principle to heterogeneous data through generalised product kernels.<n>We demonstrate that the proposed method, named DIBmix, achieves superior performance compared to four established methods.
arXiv Detail & Related papers (2024-07-03T09:06:19Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - A semi-supervised sparse K-Means algorithm [3.04585143845864]
An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering.
A semi-supervised method can use the labelled data to create constraints and enhance the clustering solution.
We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features.
arXiv Detail & Related papers (2020-03-16T02:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.