Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering
- URL: http://arxiv.org/abs/2603.04458v1
- Date: Tue, 03 Mar 2026 08:13:16 GMT
- Title: Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering
- Authors: Yiqun Zhang, Mingjie Zhao, Yizhou Chen, Yang Lu, Yiu-ming Cheung,
- Abstract summary: Heterogeneous Attribute Reconstruction and Representation (HARR) learning paradigm for cluster analysis.<n>HarR is parameter-free, convergence-guaranteed, and can more effectively self-adapt to different sought number of clusters $k$.
- Score: 60.05209293008078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Datasets composed of numerical and categorical attributes (also called mixed data hereinafter) are common in real clustering tasks. Differing from numerical attributes that indicate tendencies between two concepts (e.g., high and low temperature) with their values in well-defined Euclidean distance space, categorical attribute values are different concepts (e.g., different occupations) embedded in an implicit space. Simultaneously exploiting these two very different types of information is an unavoidable but challenging problem, and most advanced attempts either encode the heterogeneous numerical and categorical attributes into one type, or define a unified metric for them for mixed data clustering, leaving their inherent connection unrevealed. This paper, therefore, studies the connection among any-type of attributes and proposes a novel Heterogeneous Attribute Reconstruction and Representation (HARR) learning paradigm accordingly for cluster analysis. The paradigm transforms heterogeneous attributes into a homogeneous status for distance metric learning, and integrates the learning with clustering to automatically adapt the metric to different clustering tasks. Differing from most existing works that directly adopt defined distance metrics or learn attribute weights to search clusters in a subspace. We propose to project the values of each attribute into unified learnable multiple spaces to more finely represent and learn the distance metric for categorical data. HARR is parameter-free, convergence-guaranteed, and can more effectively self-adapt to different sought number of clusters $k$. Extensive experiments illustrate its superiority in terms of accuracy and efficiency.
Related papers
- Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models [64.58262227709842]
ARISE (Attention-weighted Representation with Integrated Semantic Embeddings) is presented.<n>It builds semantic-aware representations that complement the metric space of categorical data for accurate clustering.<n>Experiments on eight benchmark datasets demonstrate consistent improvements over seven representative counterparts.
arXiv Detail & Related papers (2026-01-03T11:37:46Z) - Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering [51.11677202873771]
Categorical attributes with qualitative values are ubiquitous in cluster analysis of real datasets.<n>Unlike the Euclidean distance of numerical attributes, the categorical attributes lack well-defined relationships of their possible values.<n>This paper breaks the intrinsic relationship tie of attribute categories and learns customized distance metrics suitable for flexibly revealing various cluster distributions.
arXiv Detail & Related papers (2025-11-12T06:57:24Z) - Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.<n>To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.<n>To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.<n>The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z) - Categorical Data Clustering via Value Order Estimated Distance Metric Learning [53.28598689867732]
This paper introduces a novel order distance metric learning approach to intuitively represent categorical attribute values.<n>A new joint learning paradigm is developed to alternatively perform clustering and order distance metric learning.<n>The proposed method achieves superior clustering accuracy on categorical and mixed datasets.
arXiv Detail & Related papers (2024-11-19T08:23:25Z) - Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning [0.0]
We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity.
We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric.
arXiv Detail & Related papers (2023-06-02T19:51:48Z) - Transferable Deep Metric Learning for Clustering [1.2762298148425795]
Clustering in high spaces is a difficult task; the usual dimension distance metrics may no longer be appropriate under the curse of dimensionality.
We show that we can learn a metric on a labelled dataset, then apply it to cluster a different dataset.
We achieve results competitive with the state-of-the-art while using only a small number of labelled training datasets and shallow networks.
arXiv Detail & Related papers (2023-02-13T17:09:59Z) - Unsupervised Heterogeneous Coupling Learning for Categorical
Representation [50.1603042640492]
This work introduces a UNsupervised heTerogeneous couplIng lEarning (UNTIE) approach for representing coupled categorical data by untying the interactions between couplings.
UNTIE is efficiently optimized w.r.t. a kernel k-means objective function for unsupervised representation learning of heterogeneous and hierarchical value-to-object couplings.
The UNTIE-learned representations make significant performance improvement against the state-of-the-art categorical representations and deep representation models.
arXiv Detail & Related papers (2020-07-21T11:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.