Fugu-MT 論文翻訳(概要): Superclustering by finding statistically significant separable groups of optimal gaussian clusters

論文の概要: Superclustering by finding statistically significant separable groups of optimal gaussian clusters

arxiv url: http://arxiv.org/abs/2309.02623v2
Date: Sun, 29 Oct 2023 05:21:37 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-31 20:10:35.893117
Title: Superclustering by finding statistically significant separable groups of optimal gaussian clusters
Title（参考訳）: 最適ガウスクラスターの統計的に有意な分離群を見つけるスーパークラスタリング
Authors: Oleg I.Berngardt
Abstract要約: 本稿では,BIC基準の観点から,最適なデータセットをグループ化することで,データセットをクラスタリングするアルゴリズムを提案する。このアルゴリズムの重要な利点は、既に訓練済みのクラスタに基づいて、新しいデータの正しいスーパークラスタを予測する能力である。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion, number of Gaussian clusters into the optimal, from the point of view of their statistical separability, superclusters. The algorithm consists of three stages: representation of the dataset as a mixture of Gaussian distributions - clusters, which number is determined based on the minimum of the BIC criterion; using the Mahalanobis distance, to estimate the distances between the clusters and cluster sizes; combining the resulting clusters into superclusters using the DBSCAN method by finding its hyperparameter (maximum distance) providing maximum value of introduced matrix quality criterion at maximum number of superclusters. The matrix quality criterion corresponds to the proportion of statistically significant separated superclusters among all found superclusters. The algorithm has only one hyperparameter - statistical significance level, and automatically detects optimal number and shape of superclusters based of statistical hypothesis testing approach. The algorithm demonstrates a good results on test datasets in noise and noiseless situations. An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer and perform soft (fuzzy) clustering. The disadvantages of the algorithm are: its low speed and stochastic nature of the final clustering. It requires a sufficiently large dataset for clustering, which is typical for many statistical methods.
Abstract（参考訳）: 本稿では, bic基準の観点から, ガウスクラスターの数を, 統計分離性の観点から, 最適クラスタに分類し, データセットをクラスタリングするアルゴリズムを提案する。 The algorithm consists of three stages: representation of the dataset as a mixture of Gaussian distributions - clusters, which number is determined based on the minimum of the BIC criterion; using the Mahalanobis distance, to estimate the distances between the clusters and cluster sizes; combining the resulting clusters into superclusters using the DBSCAN method by finding its hyperparameter (maximum distance) providing maximum value of introduced matrix quality criterion at maximum number of superclusters. 行列の品質基準は、すべてのスーパークラスター間で統計的に有意に分離されたスーパークラスタの割合に対応する。このアルゴリズムは1つのハイパーパラメーター(統計的重要性レベル)しか持たず、統計仮説テストアプローチに基づいて、スーパークラスタの最適数と形状を自動的に検出する。このアルゴリズムは、ノイズやノイズのない状況におけるテストデータセットに対して良い結果を示す。このアルゴリズムの重要な利点は、既にトレーニング済みのclustererをベースにした新しいデータに対して正しいスーパークラスタを予測し、ソフト(ファズィ)クラスタリングを実行する能力である。アルゴリズムの欠点は、その低速さと最終的なクラスタリングの確率的性質である。クラスタリングには十分大きなデータセットが必要であり、多くの統計的手法で典型的である。

関連論文リスト

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient [0.5939858158928473]
本稿では,分類データクラスタリングにおける最適kを推定するアルゴリズムk-SCCを提案する。 k-SCCの性能を比較するために, 合成データセットと実データセットの比較実験を行った。
論文参考訳（メタデータ） (2025-01-26T14:29:11Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
ファジィK平均クラスタリングは教師なしデータ分析において重要な手法である。本稿では,クラスタセントロイドへの依存を完全に排除する,ファジィテクストK-Meansクラスタリングアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-04-07T12:25:03Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
観測されたラベルを用いてクラスタを復元する効率的なアルゴリズムを考案する。本稿では,期待値と高い確率でこれらの下位境界との性能を一致させる最初のアルゴリズムであるIACを提案する。
論文参考訳（メタデータ） (2023-06-18T08:46:06Z)
A Computational Theory and Semi-Supervised Algorithm for Clustering [0.0]
半教師付きクラスタリングアルゴリズムを提案する。クラスタリング法のカーネルは、Mohammadの異常検出アルゴリズムである。結果は、合成および実世界のデータセットで示される。
論文参考訳（メタデータ） (2023-06-12T09:15:58Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
平均推定なしで直接データのクラスタを検出する新しいクラスタリングアルゴリズムを提案する。具体的には,バタワースフィルタを用いてデータ点間の距離行列を構成する。異なる視点に埋め込まれた相補的な情報をうまく活用するために、テンソルのSchatten p-norm正規化を利用する。
論文参考訳（メタデータ） (2023-05-12T03:01:41Z)
A sampling-based approach for efficient clustering in large datasets [0.8952229340927184]
本稿では,多数のクラスタを持つ高次元データに対して,簡便かつ効率的なクラスタリング手法を提案する。私たちのコントリビューションは、データポイントとクラスタの完全な比較を必要としないため、k-meansよりもはるかに効率的です。
論文参考訳（メタデータ） (2021-12-29T19:15:20Z)
Determinantal consensus clustering [77.34726150561087]
本稿では,クラスタリングアルゴリズムのランダム再起動における決定点プロセス (DPP) の利用を提案する。 DPPは部分集合内の中心点の多様性を好んでいる。 DPPとは対照的に、この手法は多様性の確保と、すべてのデータフェースについて良好なカバレッジを得るために失敗することを示す。
論文参考訳（メタデータ） (2021-02-07T23:48:24Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
既存のスケーラブルな階層的クラスタリング手法は、スピードの質を犠牲にする。我々は、品質を犠牲にせず、数十億のデータポイントまでスケールする、スケーラブルで集約的な階層的クラスタリング法を提案する。
論文参考訳（メタデータ） (2020-10-22T15:58:35Z)
Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix [57.11971786407279]
マルチビュースペクトルクラスタリングは、データ間の固有のクラスタ構造を効果的に明らかにすることができる。本稿では,高次最適近傍ラプラシア行列を学習するマルチビュースペクトルクラスタリングアルゴリズムを提案する。提案アルゴリズムは, 1次ベースと高次ベースの両方の線形結合の近傍を探索し, 最適ラプラシア行列を生成する。
論文参考訳（メタデータ） (2020-08-31T12:28:40Z)
Computationally efficient sparse clustering [67.95910835079825]
我々はPCAに基づく新しいクラスタリングアルゴリズムの有限サンプル解析を行う。ここでは,ミニマックス最適誤クラスタ化率を,体制$|theta infty$で達成することを示す。
論文参考訳（メタデータ） (2020-05-21T17:51:30Z)
A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP)指数は、形状や密度の異なるクラスターの存在下でうまく機能する。 FPCMはファジィの度合いと典型性の度合いを事前選択する必要がある。
論文参考訳（メタデータ） (2020-05-19T01:48:13Z)
Statistical power for cluster analysis [0.0]
クラスターアルゴリズムは、生物医学研究でますます人気がある。シミュレーションにより,共通解析におけるパワーと精度を推定する。我々は,大規模なサブグループ分離が期待される場合にのみ,クラスタ分析を適用することを推奨する。
論文参考訳（メタデータ） (2020-03-01T02:43:15Z)
Clustering Binary Data by Application of Combinatorial Optimization Heuristics [52.77024349608834]
本稿では,2値データのクラスタリング手法について検討し,まず,クラスタのコンパクトさを計測するアグリゲーション基準を定義した。近隣地域と人口動態最適化メタヒューリスティックスを用いた5つの新しいオリジナル手法が導入された。準モンテカルロ実験によって生成された16のデータテーブルから、L1の相似性と階層的クラスタリング、k-means(メドイドやPAM)の1つのアグリゲーションの比較を行う。
論文参考訳（メタデータ） (2020-01-06T23:33:31Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。