Fugu-MT 論文翻訳(概要): Clustering Mixtures with Almost Optimal Separation in Polynomial Time

論文の概要: Clustering Mixtures with Almost Optimal Separation in Polynomial Time

arxiv url: http://arxiv.org/abs/2112.00706v1
Date: Wed, 1 Dec 2021 18:34:09 GMT
ステータス: 翻訳完了
システム内更新日: 2021-12-02 14:40:22.833889
Title: Clustering Mixtures with Almost Optimal Separation in Polynomial Time
Title（参考訳）: 多項式時間でのほぼ最適分離によるクラスタリング混合
Authors: Jerry Li, Allen Liu
Abstract要約: 高次元における平均分離ガウス多様体のクラスタリング混合問題について考察する。 Delta = Theta (sqrtlog k)$ の分離は必要であり、良いクラスタリングを回復するのに十分である。我々は多くのサンプルと時間を要し、優れたクラスタリングをうまく回復できるアルゴリズムを提示する。
参考スコア（独自算出の注目度）: 21.509452343745252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of $k$ identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least $\Delta$, for some parameter $\Delta > 0$, and the goal is to recover the ground truth clustering of these samples. It is folklore that separation $\Delta = \Theta (\sqrt{\log k})$ is both necessary and sufficient to recover a good clustering, at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is $\Delta = \Omega (\log^{1/2 + c} k)$, for any $c > 0$. Previously, polynomial time algorithms were only known for this problem when the separation was polynomial in $k$, and all algorithms which could tolerate $\textsf{poly}( \log k )$ separation required quasipolynomial time. We also extend our result to mixtures of translations of a distribution which satisfies the Poincar\'{e} inequality, under additional mild assumptions. Our main technical tool, which we believe is of independent interest, is a novel way to implicitly represent and estimate high degree moments of a distribution, which allows us to extract important information about high-degree moments without ever writing down the full moment tensors explicitly.
Abstract（参考訳）: 高次元における平均分離ガウスのクラスタリング混合の問題を考える。我々は、$k$恒等共分散ガウシアンの混合物からサンプルを与えられるので、任意の2つの手段間の最小対距離が少なくとも$\Delta$で、あるパラメータ$\Delta > 0$に対して$\Delta$であり、これらのサンプルの基底真理クラスタリングを復元することが目的である。分離 $\delta = \theta (\sqrt{\log k})$ は、少なくとも理論的には、良好なクラスタリングを回復するのに必要かつ十分である。しかし、この保証を達成する推定者は非効率である。この保証にほぼ一致する多項式時間で実行される最初のアルゴリズムを与える。より正確には、任意の$c > 0$ に対して、分離が $\delta = \omega (\log^{1/2 + c} k)$ である限り、多項式的に多くのサンプルと時間をとり、良好なクラスタリングを回復するアルゴリズムを与える。これまで多項式時間アルゴリズムは、分離が$k$の多項式であるときにのみ知られており、$\textsf{poly}( \log k )$分離に必要な準ポリノミカル時間を許容できる全てのアルゴリズムが知られていた。また,poincar\'{e}不等式を満たす分布の翻訳の混合にも,さらに軽度な仮定の下で結果を拡張した。我々の主要な技術ツールは、独立した関心事であると信じており、分布の高次モーメントを暗黙的に表現し、推定する新しい方法であり、これにより、全モーメントテンソルを明示的に書き留めることなく、高次モーメントに関する重要な情報を抽出することができる。

関連論文リスト

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
我々は、長さ$H$の軌跡を、大きさ$S$の有限状態空間上の未知のエルゴードマルコフ鎖の1つによって生成される、$T$ trajectories of length $H$の問題を研究する。我々は、連鎖の遷移核間の重み付きKL分散によって支配されるクラスタリングエラー率に基づいて、インスタンス依存で高い確率の低い境界を導出する。次に,新しい2段階クラスタリングアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-06-02T05:10:40Z)
Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures [5.668124846154999]
我々は,非球面(すなわち任意の成分共分散)ガウス混合モデルをサブルーチンでクラスタリングするための新しいアプローチを開発した。本手法は特異値分解に基づく古典的次元減少の非球面アナログを与える。我々のアルゴリズムは自然に任意の外れ値の次元非依存の分数に許容するように拡張する。
論文参考訳（メタデータ） (2024-11-19T11:58:51Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
本稿では,円滑なベルマン作用素を持つ連続空間マルコフ決定過程(MDP)の一般クラスにおいて,$varepsilon$-optimal Policyを学習する問題を考察する。我々のソリューションの鍵となるのは、調和解析のアイデアに基づく新しい射影技術である。我々の結果は、連続空間 MDP における2つの人気と矛盾する視点のギャップを埋めるものである。
論文参考訳（メタデータ） (2024-05-10T09:58:47Z)
Learning general Gaussian mixtures with efficient score matching [16.06356123715737]
我々は、$d$次元で$k$ガウシアンの混合を学習する問題を研究する。我々は、下層の混合成分について分離を前提としない。我々は、ターゲット混合物から$dmathrmpoly(k/varepsilon)$サンプルを抽出し、サンプル-ポリノミカル時間で実行し、サンプリング器を構築するアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-04-29T17:30:36Z)
Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation [44.25945344950543]
境界共分散分布の混合に対するクラスタリング問題について検討する。このクラスタリングタスクに対して,最初のポリ時間アルゴリズムを提案する。我々のアルゴリズムは、対数外乱の$Omega(alpha)$-fractionに対して堅牢である。
論文参考訳（メタデータ） (2023-12-19T01:01:53Z)
Do you know what q-means? [50.045011844765185]
クラスタリングは、大規模なデータセットを分析する上で最も重要なツールの1つである。クラスタリングのための"$q$-means"アルゴリズムの改良版を提案する。また、$Obig(frack2varepsilon2(sqrtkd + log(Nd))big で実行される $varepsilon に対する "dequantized" アルゴリズムも提示する。
論文参考訳（メタデータ） (2023-08-18T17:52:12Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
効率的なプライベート推定アルゴリズムを設計するための一般的なツール。最初の効率的な$(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery。
論文参考訳（メタデータ） (2023-01-11T09:12:28Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
ファジィ(fuzzy, soft objective)は、よく知られた$k$-means問題の一般化である。クエリを少なくすることで、問題の解決が容易になる。
論文参考訳（メタデータ） (2021-06-04T02:32:26Z)
A Polynomial Time Algorithm for Learning Halfspaces with Tsybakov Noise [55.45544638410858]
本研究では,Tsybakovノイズの存在下でのPAC学習の相同性半空間の問題について検討する。我々のアルゴリズムは、任意の精度$epsilon$で真のハーフスペースを学習する。
論文参考訳（メタデータ） (2020-10-04T22:19:06Z)
Computationally efficient sparse clustering [67.95910835079825]
我々はPCAに基づく新しいクラスタリングアルゴリズムの有限サンプル解析を行う。ここでは,ミニマックス最適誤クラスタ化率を,体制$|theta infty$で達成することを示す。
論文参考訳（メタデータ） (2020-05-21T17:51:30Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
本研究では,高次元ガウス混合系の対向ロバスト条件下での効率的な学習性について検討する。理論的に最適に近い誤り証明である$tildeO(epsilon)$の情報を、$epsilon$-corrupted $k$-mixtureで学習するアルゴリズムを提供する。我々の主な技術的貢献は、ガウス混合系からの新しい頑健な識別可能性証明クラスターであり、これは正方形の定度証明システムによって捉えることができる。
論文参考訳（メタデータ） (2020-05-13T16:44:12Z)
Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
統計的に分離されたd-次元ガウスアン(k-GMM)の混合をクラスタリングするための最初のアウトリー・ローバストアルゴリズムを与える。この結果は、$d$次元単位球面上の均一分布の任意のアフィン変換のクラスタリング混合に拡張される。
論文参考訳（メタデータ） (2020-05-06T17:24:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。