Fugu-MT 論文翻訳(概要): Improved Generalization Bound and Learning of Sparsity Patterns for Data-Driven Low-Rank Approximation

論文の概要: Improved Generalization Bound and Learning of Sparsity Patterns for Data-Driven Low-Rank Approximation

arxiv url: http://arxiv.org/abs/2209.08281v1
Date: Sat, 17 Sep 2022 08:26:27 GMT
ステータス: 翻訳完了
システム内更新日: 2022-09-20 19:56:16.004992
Title: Improved Generalization Bound and Learning of Sparsity Patterns for Data-Driven Low-Rank Approximation
Title（参考訳）: データ駆動低ランク近似のための一般化境界の改善とスパーシティパターンの学習
Authors: Shinsaku Sakaue, Taihei Oki
Abstract要約: より優れた$tildemathrmO(nsk)$ bound for rank-k$ approximationを示す。また、非ゼロの学習位置が脂肪破砕次元を$mathrmO(nslog n)$でのみ増加させることも証明した。
参考スコア（独自算出の注目度）: 15.191184049312467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning sketching matrices for fast and accurate low-rank approximation (LRA) has gained increasing attention. Recently, Bartlett, Indyk, and Wagner (COLT 2022) presented a generalization bound for the learning-based LRA. Specifically, for rank-$k$ approximation using an $m \times n$ learned sketching matrix with $s$ non-zeros in each column, they proved an $\tilde{\mathrm{O}}(nsm)$ bound on the \emph{fat shattering dimension} ($\tilde{\mathrm{O}}$ hides logarithmic factors). We build on their work and make two contributions. 1. We present a better $\tilde{\mathrm{O}}(nsk)$ bound ($k \le m$). En route to obtaining the bound, we give a low-complexity \emph{Goldberg--Jerrum algorithm} for computing pseudo-inverse matrices, which would be of independent interest. 2. We alleviate an assumption of the previous study that the sparsity pattern of sketching matrices is fixed. We prove that learning positions of non-zeros increases the fat shattering dimension only by ${\mathrm{O}}(ns\log n)$. Also, experiments confirm the practical benefit of learning sparsity patterns.
Abstract（参考訳）: 高速かつ高精度な低ランク近似(LRA)のためのスケッチ行列の学習が注目されている。最近、Bartlett、Indyk、Wagner (COLT 2022) は学習ベースLRAの一般化を発表した。具体的には、$m \times n$ の学習されたスケッチ行列を用いてランク-$k$ 近似を行い、各列に $s$ の非零点を持つ、$\tilde{\mathrm{o}}(nsm)$ が \emph{fat shattering dimension} (\tilde{\mathrm{o}}$ hides logarithmic factors) に縛られることを証明した。私たちは彼らの仕事を構築し、2つの貢献をします。 1. より優れた $\tilde{\mathrm{O}}(nsk)$ bound$k \le m$ を提示します。境界を得るために、擬逆行列を計算するための低複素性 \emph{Goldberg--Jerrum algorithm} を与える。 2. スケッチ行列のスパーシティパターンが固定されているという従来の研究の仮定を緩和する。非ゼロの学習位置が脂肪破砕次元を${\mathrm{O}}(ns\log n)$でしか増加しないことを示す。また,実験により,スパーシティパターンの学習の実際的メリットが確認できた。

関連論文リスト

The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff [12.351756386062291]
低次元表現の学習と特徴写像の複雑性/不規則性の最小化のバランスを定式化する。大深度の場合、ほとんどすべての隠れ表現はおよそ$R(0)(f)$次元であり、ほとんど全ての重み行列は$W_ell$が$R(0)(f)$特異値である。興味深いことに、大きな学習率の使用は、ほぼすべての層の表現の無限深度収束を保証する注文$O(L)$ NTKを保証するために要求される。
論文参考訳（メタデータ） (2023-05-30T13:06:26Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
我々は注意問題のスパシフィケーションを考慮する。超大規模特徴量の場合、文の長さをほぼ線形に縮めることができる。
論文参考訳（メタデータ） (2023-04-10T05:52:38Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
我々は、$AinmathbbRdtimes n$へのアクセスを考えると、潜入$k$-vertex simplex $KsubsetmathbbRdtimes n$を学習する問題を考える。実行時間における$k$への依存は、トップ$k$特異値の質量が$a$であるという自然な仮定から不要であることを示す。
論文参考訳（メタデータ） (2021-05-17T16:40:48Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。