Fugu-MT 論文翻訳(概要): Computing Approximate $\ell

論文の概要: Computing Approximate $\ell_p$ Sensitivities

arxiv url: http://arxiv.org/abs/2311.04158v2
Date: Tue, 21 Nov 2023 14:55:52 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-23 04:14:58.578068
Title: Computing Approximate $\ell_p$ Sensitivities
Title（参考訳）: 約$\ell_p$感度の計算
Authors: Swati Padmanabhan, David P. Woodruff, and Qiuyi Zhang
Abstract要約: 我々は、与えられた行列の$ell_p$感度と要約統計を近似する効率的なアルゴリズムを提供する。実世界のデータセットにおける幅広い種類の行列に対して、全体の感度は素早く近似でき、理論的な予測よりもかなり小さい。
参考スコア（独自算出の注目度）: 46.95464524720463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent works in dimensionality reduction for regression tasks have introduced the notion of sensitivity, an estimate of the importance of a specific datapoint in a dataset, offering provable guarantees on the quality of the approximation after removing low-sensitivity datapoints via subsampling. However, fast algorithms for approximating $\ell_p$ sensitivities, which we show is equivalent to approximate $\ell_p$ regression, are known for only the $\ell_2$ setting, in which they are termed leverage scores. In this work, we provide efficient algorithms for approximating $\ell_p$ sensitivities and related summary statistics of a given matrix. In particular, for a given $n \times d$ matrix, we compute $\alpha$-approximation to its $\ell_1$ sensitivities at the cost of $O(n/\alpha)$ sensitivity computations. For estimating the total $\ell_p$ sensitivity (i.e. the sum of $\ell_p$ sensitivities), we provide an algorithm based on importance sampling of $\ell_p$ Lewis weights, which computes a constant factor approximation to the total sensitivity at the cost of roughly $O(\sqrt{d})$ sensitivity computations. Furthermore, we estimate the maximum $\ell_1$ sensitivity, up to a $\sqrt{d}$ factor, using $O(d)$ sensitivity computations. We generalize all these results to $\ell_p$ norms for $p > 1$. Lastly, we experimentally show that for a wide class of matrices in real-world datasets, the total sensitivity can be quickly approximated and is significantly smaller than the theoretical prediction, demonstrating that real-world datasets have low intrinsic effective dimensionality.
Abstract（参考訳）: 回帰タスクの次元的削減に関する最近の研究は、データセットにおける特定のデータポイントの重要性を推定する感度の概念を導入し、サブサンプリングによる低感度データポイントの除去後の近似の品質保証を提供する。しかし、近似的な$\ell_p$回帰と同値である$\ell_p$感度を近似する高速アルゴリズムは、レバレッジスコアと呼ばれる$\ell_2$設定でのみ知られている。本研究では,与えられた行列の$\ell_p$ 感性および関連する要約統計を近似する効率的なアルゴリズムを提案する。特に、与えられた$n \times d$ 行列に対して、$o(n/\alpha)$ 感度計算のコストで $\alpha$-approximation をその$\ell_1$ 感度に計算する。合計$\ell_p$感度(すなわち$\ell_p$感度の和)を推定するために、約$O(\sqrt{d})$感度計算のコストでの総感度に対する定数係数近似を演算する、$\ell_p$Lewis重みの重要サンプリングに基づくアルゴリズムを提供する。さらに、$O(d)$の感度計算を用いて、最大$\ell_1$の感度を$\sqrt{d}$の係数まで推定する。これらの結果を全て$\ell_p$ norms for $p > 1$に一般化する。最後に、実世界のデータセットの幅広いクラスにおいて、全感度を迅速に近似し、理論的予測よりも著しく小さくし、実世界のデータセットは本質的な有効次元が低いことを示した。

論文の概要: Computing Approximate $\ell_p$ Sensitivities

関連論文リスト