Fugu-MT 論文翻訳(概要): Sharper Bounds for $\ell

論文の概要: Sharper Bounds for $\ell_p$ Sensitivity Sampling

arxiv url: http://arxiv.org/abs/2306.00732v1
Date: Thu, 1 Jun 2023 14:27:28 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-02 15:35:36.987359
Title: Sharper Bounds for $\ell_p$ Sensitivity Sampling
Title（参考訳）: 感度サンプリング$\ell_p$のシャープ境界
Authors: David P. Woodruff, Taisuke Yasuda
Abstract要約: 大規模な機械学習において、ランダムサンプリングは、サンプルの小さな代表部分集合によってデータセットを近似する一般的な方法である。本研究では,$ell_p$部分空間埋め込みに対して$pne$q 2$に対する感度サンプリングの最初の境界を示す。我々の感度サンプリングの結果は、より小さな$ell_pの感度を持つ広範囲の構造化行列に対して、最もよく知られたサンプルの複雑さをもたらす。
参考スコア（独自算出の注目度）: 62.38157566916501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension $d$ and the total sensitivity $\mathfrak S$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak S d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p\neq 2$ that improve over the general $\mathfrak S d$ bound, achieving a bound of roughly $\mathfrak S^{2/p}$ for $1\leq p<2$ and $\mathfrak S^{2-2/p}$ for $2<p<\infty$. For $1\leq p<2$, we show that this bound is tight, in the sense that there exist matrices for which $\mathfrak S^{2/p}$ samples is necessary. Furthermore, our techniques yield further new results in the study of sampling algorithms, showing that the root leverage score sampling algorithm achieves a bound of roughly $d$ for $1\leq p<2$, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly $d^{2/p}\mathfrak S^{2-4/p}$ for $2<p<\infty$. Our sensitivity sampling results yield the best known sample complexity for a wide class of structured matrices that have small $\ell_p$ sensitivity.
Abstract（参考訳）: 大規模な機械学習において、ランダムサンプリングは、サンプルの小さな代表部分集合によってデータセットを近似する一般的な方法である。特に、感度サンプリングは、非常に一般的な設定でvc次元 $d$ と総感度 $\mathfrak s$ の積に例の数を減少させながら、近似の質を証明可能な保証を提供する、非常に研究された技術である。しかし、この一般的な境界である$\mathfrak s d$ を超える保証は、以前の仕事における感度サンプリングの徹底的な研究にもかかわらず、$\ell_2$ 部分空間埋め込みに対しておそらく1つの設定で知られている。この研究では、$\ell_p$ 部分空間埋め込みに対する$p\neq 2$ に対する感度サンプリングの最初のバウンドを示す。これは一般的な$\mathfrak S d$ よりも改善され、約$\mathfrak S^{2/p} のバウンドを$1\leq p<2$ および$\mathfrak S^{2-2/p} に対して$2<p<\infty$ で達成する。 $1\leq p<2$ の場合、この境界は、$\mathfrak S^{2/p} のサンプルが必要とされる行列が存在するという意味で、厳密であることを示す。さらに,本手法はサンプリングアルゴリズムの研究においてさらに新たな結果をもたらし,ルートレバレッジスコアサンプリングアルゴリズムが約$d$1\leq p<2$,レバレッジスコアと感度サンプリングの組み合わせで約$d^{2/p}\mathfrak S^{2-4/p}$2<p<\infty$とした。感度サンプリングの結果、$\ell_p$の感度の小さい構造行列の最もよく知られたサンプル複雑性が得られる。

論文の概要: Sharper Bounds for $\ell_p$ Sensitivity Sampling

関連論文リスト