Fugu-MT 論文翻訳(概要): Active Sampling for Linear Regression Beyond the $\ell

論文の概要: Active Sampling for Linear Regression Beyond the $\ell_2$ Norm

arxiv url: http://arxiv.org/abs/2111.04888v1
Date: Tue, 9 Nov 2021 00:20:01 GMT
ステータス: 翻訳完了
システム内更新日: 2021-11-10 23:21:46.469527
Title: Active Sampling for Linear Regression Beyond the $\ell_2$ Norm
Title（参考訳）: $\ell_2$ Normを超える線形回帰のためのアクティブサンプリング
Authors: Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda
Abstract要約: 対象ベクトルの少数のエントリのみを問合せすることを目的とした線形回帰のためのアクティブサンプリングアルゴリズムについて検討する。我々はこの$d$への依存が対数的要因まで最適であることを示す。また、損失関数に対して最初の全感度上界$O(dmax1,p/2log2 n)$を提供し、最大で$p$成長する。
参考スコア（独自算出の注目度）: 70.49273459706546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector $b\in\mathbb{R}^n$ and output a near minimizer to $\min_{x\in\mathbb{R}^d}\|Ax-b\|$, where $A\in\mathbb{R}^{n \times d}$ is a design matrix and $\|\cdot\|$ is some loss function. For $\ell_p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling that outputs a $(1+\epsilon)$ approximate solution using just $\tilde{O}(d^{\max(1,{p/2})}/\mathrm{poly}(\epsilon))$ queries to $b$. We show that this dependence on $d$ is optimal, up to logarithmic factors. Our result resolves a recent open question of Chen and Derezi\'{n}ski, who gave near optimal bounds for the $\ell_1$ norm, and suboptimal bounds for $\ell_p$ regression with $p\in(1,2)$. We also provide the first total sensitivity upper bound of $O(d^{\max\{1,p/2\}}\log^2 n)$ for loss functions with at most degree $p$ polynomial growth. This improves a recent result of Tukan, Maalouf, and Feldman. By combining this with our techniques for the $\ell_p$ regression result, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries, answering another open question of Chen and Derezi\'{n}ski. For the important special case of the Huber loss, we further improve our bound to an active sample complexity of $\tilde O(d^{(1+\sqrt2)/2}/\epsilon^c)$ and a non-active sample complexity of $\tilde O(d^{4-2\sqrt 2}/\epsilon^c)$, improving a previous $d^4$ bound for Huber regression due to Clarkson and Woodruff. Our sensitivity bounds have further implications, improving a variety of previous results using sensitivity sampling, including Orlicz norm subspace embeddings and robust subspace approximation. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $\ell_p$ norm.
Abstract（参考訳）: 対象ベクトル $b\in\mathbb{R}^n$ の少数のエントリのみを問合せし、近小数点を $\min_{x\in\mathbb{R}^d}\|Ax-b\|$ に出力する線形回帰のアクティブサンプリングアルゴリズムについて検討する。任意の$0<p<\infty$ に対して$\ell_p$ のノルム回帰に対して、lewisの重みサンプリングに基づくアルゴリズムを与え、$(1+\epsilon)$ の近似解を$\tilde{o}(d^{\max(1,{p/2})}/\mathrm{poly}(\epsilon))$ のクエリで出力する。我々はこの$d$への依存が対数的要因まで最適であることを示す。その結果、chen と derezi\'{n}ski は、$\ell_1$ のノルムに対してほぼ最適境界を与え、$p\in(1,2)$ で$\ell_p$ の回帰に対して準最適境界を与えた。また、多項式成長度が最大となる損失関数に対して、最初の全感度上限である$o(d^{\max\{1,p/2\}}\log^2 n)$を提供する。これはTukan、Maalouf、Feldmanの最近の結果を改善する。これを $\ell_p$ 回帰結果の手法と組み合わせることで、 $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ クエリのアクティブ回帰アルゴリズムが得られ、Chen と Derezi\'{n}ski の別のオープンな質問に答える。ハマー損失の重要な特別な場合に対して、我々はさらに $\tilde O(d^{(1+\sqrt2)/2}/\epsilon^c)$ と $\tilde O(d^{4-2\sqrt 2}/\epsilon^c)$ の非活性サンプル複雑性へのバウンドを改善し、クラークソンとウッドラフによるハマー回帰に対する以前の$d^4$バウンドを改善する。我々の感度境界はさらに意味を持ち、orliczノルム部分空間埋め込みやロバスト部分空間近似など、感度サンプリングを用いて様々な結果を改善する。最後に、我々のアクティブサンプリング結果は、$\ell_p$ノルムごとにクロネッカー積回帰に対する最初の部分線形時間アルゴリズムを与える。

論文の概要: Active Sampling for Linear Regression Beyond the $\ell_2$ Norm

関連論文リスト