Fugu-MT 論文翻訳(概要): Khatri-Rao Clustering for Data Summarization

論文の概要: Khatri-Rao Clustering for Data Summarization

arxiv url: http://arxiv.org/abs/2603.06602v2
Date: Tue, 10 Mar 2026 09:29:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-15 16:38:22.416634
Title: Khatri-Rao Clustering for Data Summarization
Title（参考訳）: データ要約のためのKhatri-Raoクラスタリング
Authors: Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki Mannila,
Abstract要約: 広く採用されているCentroidベースのクラスタリングは、少数のプロトタイプの観点から、データセットの有益な要約を見つける。広く採用されているにもかかわらず、結果として得られるデータ要約は冗長性を含んでいることが多い。 Khatri-Raoクラスタリングのパラダイムを導入し、従来のCentroidベースのクラスタリングを拡張して、より簡潔で、同じくらい正確なデータサマリーを生成する。
参考スコア（独自算出の注目度）: 16.986754788004642
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based clustering, a widely adopted approach to address this challenge, finds informative summaries of datasets in terms of few prototypes, each representing a cluster in the data. Despite their wide adoption, the resulting data summaries often contain redundancies, limiting their effectiveness particularly in datasets characterized by a large number of underlying clusters. To overcome this limitation, we introduce the Khatri-Rao clustering paradigm that extends traditional centroid-based clustering to produce more succinct but equally accurate data summaries by postulating that centroids arise from the interaction of two or more succinct sets of protocentroids. We study two central approaches to centroid-based clustering, namely the well-established k-Means algorithm and the increasingly popular topic of deep clustering, under the lens of the Khatri-Rao paradigm. To this end, we introduce the Khatri-Rao k-Means algorithm and the Khatri-Rao deep clustering framework. Extensive experiments show that Khatri-Rao k-Means can strike a more favorable trade-off between succinctness and accuracy in data summarization than standard k-Means. Leveraging representation learning, the Khatri-Rao deep clustering framework offers even greater benefits, reducing even more the size of data summaries given by deep clustering while preserving their accuracy.
Abstract（参考訳）: データセットのサイズと複雑さが拡大するにつれて、簡潔で正確なデータサマリーを見つけることが大きな課題となる。この課題に対処する広く採用されているアプローチであるCentroidベースのクラスタリングでは、データ内のクラスタを表す少数のプロトタイプという観点から、データセットの情報的な要約を見つける。広く採用されているにもかかわらず、結果として得られるデータ要約は冗長性を含むことが多く、特に多くの基盤となるクラスタによって特徴づけられるデータセットにおける有効性を制限する。この制限を克服するために、我々は2つ以上のプロトセントロイドの相互作用からセントロイドが生じることを仮定して、従来のセントロイドベースのクラスタリングを拡張して、より簡潔だが等しく正確なデータ要約を生成するカトリ・ラオクラスタリングパラダイムを導入する。我々は,K-Meansアルゴリズムの確立と,Khatri-Raoパラダイムのレンズ下での深層クラスタリングの話題として,Centroid-based clusteringの2つの中心的アプローチについて検討した。そこで我々は,Khatri-Rao k-MeansアルゴリズムとKhatri-Rao深層クラスタリングフレームワークを紹介する。大規模な実験により、K-Meansは標準的なk-Meansよりも簡潔さとデータの要約における正確さと正確さのトレードオフがより有利であることが示されている。表現学習を活用することで、Khatri-Raoのディープクラスタリングフレームワークは、さらに大きなメリットを提供し、ディープクラスタリングによって与えられるデータサマリのサイズをさらに削減し、正確性を保っている。

論文の概要: Khatri-Rao Clustering for Data Summarization

関連論文リスト