Mixed Data Clustering Survey and Challenges
- URL: http://arxiv.org/abs/2512.03070v1
- Date: Thu, 27 Nov 2025 08:20:05 GMT
- Title: Mixed Data Clustering Survey and Challenges
- Authors: Guillaume Guerard, Sonia Djebali,
- Abstract summary: This paper introduces a clustering method grounded in pretopological spaces.<n> benchmarking against classical numerical clustering algorithms yields insights into the performance and effectiveness of the proposed method.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of the big data paradigm has transformed how industries manage and analyze information, ushering in an era of unprecedented data volume, velocity, and variety. Within this landscape, mixed-data clustering has become a critical challenge, requiring innovative methods that can effectively exploit heterogeneous data types, including numerical and categorical variables. Traditional clustering techniques, typically designed for homogeneous datasets, often struggle to capture the additional complexity introduced by mixed data, underscoring the need for approaches specifically tailored to this setting. Hierarchical and explainable algorithms are particularly valuable in this context, as they provide structured, interpretable clustering results that support informed decision-making. This paper introduces a clustering method grounded in pretopological spaces. In addition, benchmarking against classical numerical clustering algorithms and existing pretopological approaches yields insights into the performance and effectiveness of the proposed method within the big data paradigm.
Related papers
- Sparse clustering via the Deterministic Information Bottleneck algorithm [0.0]
When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges.<n>We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering.
arXiv Detail & Related papers (2026-01-28T14:05:44Z) - PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering [0.0]
This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction.<n>Our approach formulates customizable logical rules and adjustable hyper parameters that allow for user-defined hierarchical cluster construction.<n> Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability.
arXiv Detail & Related papers (2025-11-27T08:20:22Z) - Categorical data clustering: 25 years beyond K-modes [1.545264698293902]
categorical data clustering is a common and important task in computer science.<n>This review provides a comprehensive synthesis of categorical data clustering in the past twenty-five years.<n>It elucidates the pivotal role of categorical data clustering in diverse fields such as health sciences, natural sciences, social sciences, education, engineering and economics.
arXiv Detail & Related papers (2024-08-30T12:36:00Z) - A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data [0.0]
We present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables.<n>The proposed approach extends the Information Bottleneck principle to heterogeneous data through generalised product kernels.<n>We demonstrate that the proposed method, named DIBmix, achieves superior performance compared to four established methods.
arXiv Detail & Related papers (2024-07-03T09:06:19Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z) - A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and
Future Directions [48.97008907275482]
Clustering is a fundamental machine learning task which has been widely studied in the literature.
Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community.
We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering.
arXiv Detail & Related papers (2022-06-15T15:05:13Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.