Fugu-MT 論文翻訳(概要): Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering

論文の概要: Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering

arxiv url: http://arxiv.org/abs/2603.04458v1
Date: Tue, 03 Mar 2026 08:13:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-06 22:06:10.900598
Title: Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering
Title（参考訳）: 不均一属性データクラスタリングのための統一距離メトリクスの学習
Authors: Yiqun Zhang, Mingjie Zhao, Yizhou Chen, Yang Lu, Yiu-ming Cheung,
Abstract要約: クラスタ分析のための異種属性再構成と表現(HARR)学習パラダイム HarR はパラメータフリーで収束保証され、異なる要求されたクラスタ数 $k$ に対してより効果的に自己適応することができる。
参考スコア（独自算出の注目度）: 60.05209293008078
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Datasets composed of numerical and categorical attributes (also called mixed data hereinafter) are common in real clustering tasks. Differing from numerical attributes that indicate tendencies between two concepts (e.g., high and low temperature) with their values in well-defined Euclidean distance space, categorical attribute values are different concepts (e.g., different occupations) embedded in an implicit space. Simultaneously exploiting these two very different types of information is an unavoidable but challenging problem, and most advanced attempts either encode the heterogeneous numerical and categorical attributes into one type, or define a unified metric for them for mixed data clustering, leaving their inherent connection unrevealed. This paper, therefore, studies the connection among any-type of attributes and proposes a novel Heterogeneous Attribute Reconstruction and Representation (HARR) learning paradigm accordingly for cluster analysis. The paradigm transforms heterogeneous attributes into a homogeneous status for distance metric learning, and integrates the learning with clustering to automatically adapt the metric to different clustering tasks. Differing from most existing works that directly adopt defined distance metrics or learn attribute weights to search clusters in a subspace. We propose to project the values of each attribute into unified learnable multiple spaces to more finely represent and learn the distance metric for categorical data. HARR is parameter-free, convergence-guaranteed, and can more effectively self-adapt to different sought number of clusters $k$. Extensive experiments illustrate its superiority in terms of accuracy and efficiency.
Abstract（参考訳）: 数値的および分類的属性(以下、混合データ)からなるデータセットは、実際のクラスタリングタスクで一般的である。ユークリッド距離空間における2つの概念(例えば、高温、低温)とそれらの値の間の傾向を示す数値的属性から、カテゴリー的属性値は暗黙空間に埋め込まれた異なる概念(例えば、異なる職業)である。これら2つの非常に異なるタイプの情報を同時に活用することは避けられないが難しい問題であり、最も先進的な試みは不均一な数値属性とカテゴリ属性を1つのタイプにエンコードするか、混合データクラスタリングのための統一されたメトリックを定義するか、固有の接続を未発見のままにしておくかのいずれかである。そこで本研究では,任意の属性間の関連性について検討し,クラスタ分析に基づく新しい異種属性再構成・表現(HARR)学習パラダイムを提案する。このパラダイムは、異種属性を距離メトリック学習のための均質な状態に変換し、学習とクラスタリングを統合して、メトリックを異なるクラスタリングタスクに自動的に適応させる。定義された距離のメトリクスを直接採用したり、サブスペース内のクラスタを検索するために属性の重みを学習する、既存のほとんどの作業から逸脱する。本稿では,各属性の値を統合学習可能な複数の空間に投影し,分類データの距離メトリックをよりきめ細やかに表現し,学習することを提案する。 HARR はパラメータフリーで収束保証され、異なる要求されたクラスタ数$k$ に対してより効果的に自己適応することができる。大規模な実験は、精度と効率の点でその優位性を示している。

論文の概要: Learning Unified Distance Metric for Heterogeneous Attribute Data Clustering

関連論文リスト