Fugu-MT 論文翻訳(概要): Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

論文の概要: Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

arxiv url: http://arxiv.org/abs/2312.11769v1
Date: Tue, 19 Dec 2023 01:01:53 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-20 17:37:29.044571
Title: Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation
Title（参考訳）: 最適分離下における有界共分散分布のクラスタリング混合
Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Thanasis Pittas
Abstract要約: 境界共分散分布の混合に対するクラスタリング問題について検討する。このクラスタリングタスクに対して,最初のポリ時間アルゴリズムを提案する。我々のアルゴリズムは、対数外乱の$Omega(alpha)$-fractionに対して堅牢である。
参考スコア（独自算出の注目度）: 44.25945344950543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge \alpha$ for some known parameter $\alpha$, and each $P_i$ has unknown covariance $\Sigma_i \preceq \sigma^2_i \cdot I_d$ for some unknown $\sigma_i$, the goal is to cluster the samples assuming a pairwise mean separation in the order of $(\sigma_i+\sigma_j)/\sqrt{\alpha}$ between every pair of components $P_i$ and $P_j$. Our contributions are as follows: For the special case of nearly uniform mixtures, we give the first poly-time algorithm for this clustering task. Prior work either required separation scaling with the maximum cluster standard deviation (i.e. $\max_i \sigma_i$) [DKK+22b] or required both additional structural assumptions and mean separation scaling as a large degree polynomial in $1/\alpha$ [BKK22]. For general-weight mixtures, we point out that accurate clustering is information-theoretically impossible under our fine-grained mean separation assumptions. We introduce the notion of a clustering refinement -- a list of not-too-small subsets satisfying a similar separation, and which can be merged into a clustering approximating the ground truth -- and show that it is possible to efficiently compute an accurate clustering refinement of the samples. Furthermore, under a variant of the "no large sub-cluster'' condition from in prior work [BKK22], we show that our algorithm outputs an accurate clustering, not just a refinement, even for general-weight mixtures. As a corollary, we obtain efficient clustering algorithms for mixtures of well-conditioned high-dimensional log-concave distributions. Moreover, our algorithm is robust to $\Omega(\alpha)$-fraction of adversarial outliers.
Abstract（参考訳）: 境界共分散分布の混合に対するクラスタリング問題をきめ細かい分離仮定の下で検討する。具体的には、$k$-コンポーネントの混合分散 $d = \sum_{i =1}^k w_i p_i$, ここで、既知のパラメータ $\alpha$ に対して各$w_i \ge \alpha$, そして、各$p_i$ は未知の共分散 $\sigma_i \preceq \sigma^2_i \cdot i_d$ を持つ。我々の貢献は以下の通りである: ほぼ均一な混合の場合、このクラスタリングタスクのための最初のポリ時間アルゴリズムを与える。以前の作業では、最大クラスタ標準偏差による分離スケーリング(例えば、$\max_i \sigma_i$) [DKK+22b] が必要か、あるいは1/\alpha$ [BKK22] の高次多項式として追加の構造仮定と平均分離スケーリングの両方が必要であった。一般質量混合では, 正確なクラスタリングは, 詳細な平均分離仮定の下では情報理論的に不可能である。我々は,類似した分離を満足し,基底真理を近似するクラスタリングにマージ可能な,比較的小さなサブセットのリストであるクラスタリング改良の概念を導入し,サンプルの正確なクラスタリング改良を効率的に計算できることを示す。さらに,先行研究 [bkk22] による "no large sub-cluster'' 条件の変種において,本アルゴリズムが単に改良しただけでなく,一般重量混合においても正確なクラスタリングを出力することを示す。コーナリーとして、よく条件付き高次元対数凹分布の混合に対する効率的なクラスタリングアルゴリズムを得る。さらに,我々のアルゴリズムは,逆数外乱の$\Omega(\alpha)$-fractionに対して頑健である。

関連論文リスト

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures [5.668124846154999]
我々は,非球面(すなわち任意の成分共分散)ガウス混合モデルをサブルーチンでクラスタリングするための新しいアプローチを開発した。本手法は特異値分解に基づく古典的次元減少の非球面アナログを与える。我々のアルゴリズムは自然に任意の外れ値の次元非依存の分数に許容するように拡張する。
論文参考訳（メタデータ） (2024-11-19T11:58:51Z)
Are Easy Data Easy (for K-Means) [0.0]
本稿では、$k$-meansアルゴリズムの様々なブランドによって、適切に分離されたクラスタを復元する能力について検討する。シード選択時に繰り返しサブサンプリングによって$k$-means++のバリエーションが提案される。
論文参考訳（メタデータ） (2023-08-02T09:40:19Z)
Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
IAC (Instance-Adaptive Clustering, インスタンス適応クラスタリング) を提案する。 IACは$ MathcalO(n, textpolylog(n) $の計算複雑性を維持しており、大規模問題に対してスケーラブルで実用的なものである。
論文参考訳（メタデータ） (2023-06-18T08:46:06Z)
Replicable Clustering [57.19013971737493]
我々は,統計学的な$k$-medians,統計学的な$k$-means,統計学的な$k$-centers問題のアルゴリズムをブラックボックス方式で近似ルーチンを用いて提案する。理論的結果を検証するブラックボックスとしてsklearnの$k$-means++実装を用いた2次元合成分布の実験も行っている。
論文参考訳（メタデータ） (2023-02-20T23:29:43Z)
Differentially-Private Clustering of Easy Instances [67.04951703461657]
異なるプライベートクラスタリングでは、個々のデータポイントに関する情報を公開せずに、$k$のクラスタセンターを特定することが目標だ。我々は、データが"簡単"である場合にユーティリティを提供する実装可能な差分プライベートクラスタリングアルゴリズムを提供する。我々は、非プライベートクラスタリングアルゴリズムを簡単なインスタンスに適用し、結果をプライベートに組み合わせることのできるフレームワークを提案する。
論文参考訳（メタデータ） (2021-12-29T08:13:56Z)
Clustering Mixtures with Almost Optimal Separation in Polynomial Time [21.509452343745252]
高次元における平均分離ガウス多様体のクラスタリング混合問題について考察する。 Delta = Theta (sqrtlog k)$ の分離は必要であり、良いクラスタリングを回復するのに十分である。我々は多くのサンプルと時間を要し、優れたクラスタリングをうまく回復できるアルゴリズムを提示する。
論文参考訳（メタデータ） (2021-12-01T18:34:09Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
ファジィ(fuzzy, soft objective)は、よく知られた$k$-means問題の一般化である。クエリを少なくすることで、問題の解決が容易になる。
論文参考訳（メタデータ） (2021-06-04T02:32:26Z)
K-expectiles clustering [0.0]
本稿では,期待値に基づく分割クラスタリングアルゴリズムを提案する。固定$tau$クラスタリングと適応$tau$クラスタリングの2つのスキームを提案します。
論文参考訳（メタデータ） (2021-03-16T21:14:56Z)
Computationally efficient sparse clustering [67.95910835079825]
我々はPCAに基づく新しいクラスタリングアルゴリズムの有限サンプル解析を行う。ここでは,ミニマックス最適誤クラスタ化率を,体制$|theta infty$で達成することを示す。
論文参考訳（メタデータ） (2020-05-21T17:51:30Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
本研究では,高次元ガウス混合系の対向ロバスト条件下での効率的な学習性について検討する。理論的に最適に近い誤り証明である$tildeO(epsilon)$の情報を、$epsilon$-corrupted $k$-mixtureで学習するアルゴリズムを提供する。我々の主な技術的貢献は、ガウス混合系からの新しい頑健な識別可能性証明クラスターであり、これは正方形の定度証明システムによって捉えることができる。
論文参考訳（メタデータ） (2020-05-13T16:44:12Z)
Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
統計的に分離されたd-次元ガウスアン(k-GMM)の混合をクラスタリングするための最初のアウトリー・ローバストアルゴリズムを与える。この結果は、$d$次元単位球面上の均一分布の任意のアフィン変換のクラスタリング混合に拡張される。
論文参考訳（メタデータ） (2020-05-06T17:24:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。