Fugu-MT 論文翻訳(概要): Replicable Clustering

論文の概要: Replicable Clustering

arxiv url: http://arxiv.org/abs/2302.10359v3
Date: Fri, 27 Oct 2023 00:14:27 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-30 18:45:05.912457
Title: Replicable Clustering
Title（参考訳）: replicableクラスタリング
Authors: Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni, Grigoris Velegkas, Felix Zhou
Abstract要約: 我々は,統計学的な$k$-medians,統計学的な$k$-means,統計学的な$k$-centers問題のアルゴリズムをブラックボックス方式で近似ルーチンを用いて提案する。理論的結果を検証するブラックボックスとしてsklearnの$k$-means++実装を用いた2次元合成分布の実験も行っている。
参考スコア（独自算出の注目度）: 57.19013971737493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design replicable algorithms in the context of statistical clustering under the recently introduced notion of replicability from Impagliazzo et al. [2022]. According to this definition, a clustering algorithm is replicable if, with high probability, its output induces the exact same partition of the sample space after two executions on different inputs drawn from the same distribution, when its internal randomness is shared across the executions. We propose such algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their combinatorial counterparts in a black-box manner. In particular, we demonstrate a replicable $O(1)$-approximation algorithm for statistical Euclidean $k$-medians ($k$-means) with $\operatorname{poly}(d)$ sample complexity. We also describe an $O(1)$-approximation algorithm with an additional $O(1)$-additive error for statistical Euclidean $k$-centers, albeit with $\exp(d)$ sample complexity. In addition, we provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
Abstract（参考訳）: 最近導入されたimpagliazzoらによる再現性の概念に基づいて,統計クラスタリングの文脈でレプリカブルアルゴリズムを設計する。 [2022]. この定義によれば、クラスタリングアルゴリズムは、高い確率で、その出力が同一の分布から引き出された異なる入力に対する2つの実行の後、その実行中に内部ランダム性が共有されると、サンプル空間の全く同じ分割を誘導する。そこで本研究では,統計量k$-medians,統計値k$-means,統計値k$-centers問題に対する近似ルーチンをブラックボックス方式で利用するアルゴリズムを提案する。特に、統計的ユークリッドの$k$-medians(k$-means)に対して$\operatorname{poly}(d)$サンプル複雑性を持つレプリカブルな$O(1)$-approximationアルゴリズムを実証する。また、統計的ユークリッド$k$-centersに対して$O(1)$-approximationアルゴリズムを付加的な$O(1)$-additive errorで記述する。さらに,sklearn の $k$-means++ 実装をブラックボックスとして2次元の合成分布実験を行い,理論的結果を検証する。

関連論文リスト

Simple, Scalable and Effective Clustering via One-Dimensional Projections [10.807367640692021]
クラスタリングは、教師なし機械学習における基本的な問題であり、データ分析に多くの応用がある。任意の$k$に対して、期待時間$O(mathrmnnz(X) + nlog n)$で確実に動作する単純なランダム化クラスタリングアルゴリズムを導入する。我々は,このアルゴリズムが$k$-means目的の任意の入力データセットに対して,近似比$smashwidetildeO(k4)$を達成することを証明した。
論文参考訳（メタデータ） (2023-10-25T16:37:45Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
効率的なプライベート推定アルゴリズムを設計するための一般的なツール。最初の効率的な$(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery。
論文参考訳（メタデータ） (2023-01-11T09:12:28Z)
Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
本研究では,高次元スパース平均推定の問題点を,逆数外乱の$epsilon$-fractionの存在下で検討する。我々のアルゴリズムは、サム・オブ・スクエア(Sum-of-Squares)ベースのアルゴリズムアプローチに従う。
論文参考訳（メタデータ） (2022-06-07T16:49:54Z)
Systematically improving existing k-means initialization algorithms at nearly no cost, by pairwise-nearest-neighbor smoothing [1.2570180539670577]
PNN-smoothingと呼ばれる$k$-meansクラスタリングアルゴリズムを初期化するメタメソッドを提案する。与えられたデータセットを$J$のランダムなサブセットに分割し、各データセットを個別にクラスタリングし、結果のクラスタリングをペアワイズ・アネレス・ニーバーメソッドとマージする。
論文参考訳（メタデータ） (2022-02-08T15:56:30Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
本稿では,データセットの大部分を敵が破壊できるリストデコタブル平均推定の問題について検討する。我々は、ほぼ最適な統計的保証を達成するために、リストデコダブル平均推定のための新しいアルゴリズムを開発した。
論文参考訳（メタデータ） (2021-06-16T03:34:14Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
ファジィ(fuzzy, soft objective)は、よく知られた$k$-means問題の一般化である。クエリを少なくすることで、問題の解決が容易になる。
論文参考訳（メタデータ） (2021-06-04T02:32:26Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
高次元におけるリストデコタブル平均推定の基本的な課題について検討する。我々のアルゴリズムは、すべての$k = O(sqrtd) cup Omega(d)$に対して$widetildeO(ndk)$で実行されます。我々のアルゴリズムの変種は、すべての$k$に対してランタイム$widetildeO(ndk)$を持ち、リカバリ保証の$O(sqrtlog k)$ Factorを犠牲にしている。
論文参考訳（メタデータ） (2020-11-19T17:21:37Z)
A Randomized Algorithm to Reduce the Support of Discrete Measures [79.55586575988292]
離散確率測度が$N$原子と$n$実数値関数の集合で成り立つと、元の$N$原子の$n+1$の部分集合で支えられる確率測度が存在する。我々は、負の円錐によるバリセンターの簡単な幾何学的特徴付けを与え、この新しい測度を「グリード幾何学的サンプリング」によって計算するランダム化アルゴリズムを導出する。次に、その性質を研究し、それを合成および実世界のデータにベンチマークして、$Ngg n$ regimeにおいて非常に有益であることを示す。
論文参考訳（メタデータ） (2020-06-02T16:38:36Z)
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives [10.291482850329892]
混合整数計画法(MIP)は、(最適に) $ell_0$-正規化回帰問題を解くために用いられる。数分で5万ドルの機能を処理できる正確なアルゴリズムと、$papprox6$でインスタンスに対処できる近似アルゴリズムの2つのクラスを提案する。さらに,$ell$-regularizedsに対する新しい推定誤差境界を提案する。
論文参考訳（メタデータ） (2020-01-17T18:47:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。