Fugu-MT 論文翻訳(概要): On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

論文の概要: On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

arxiv url: http://arxiv.org/abs/2006.05800v4
Date: Tue, 3 Nov 2020 02:20:13 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-23 05:06:35.218408
Title: On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression
Title（参考訳）: 過パラメータ線形回帰における最適重み付き$\ell_2$正則化について
Authors: Denny Wu and Ji Xu
Abstract要約: 線形モデル $mathbfy = mathbfX mathbfbeta_star + mathbfepsilon$ with $mathbfXin mathbbRntimes p$ in the overparameterized regime $p>n$ を考える。予測リスク $mathbbE(y-mathbfxThatmathbfbeta_lambda)2$ in proportional limit $p/n の正確なキャラクタリゼーションを提供する。
参考スコア（独自算出の注目度）: 23.467801864841526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the linear model $\mathbf{y} = \mathbf{X} \mathbf{\beta}_\star + \mathbf{\epsilon}$ with $\mathbf{X}\in \mathbb{R}^{n\times p}$ in the overparameterized regime $p>n$. We estimate $\mathbf{\beta}_\star$ via generalized (weighted) ridge regression: $\hat{\mathbf{\beta}}_\lambda = \left(\mathbf{X}^T\mathbf{X} + \lambda \mathbf{\Sigma}_w\right)^\dagger \mathbf{X}^T\mathbf{y}$, where $\mathbf{\Sigma}_w$ is the weighting matrix. Under a random design setting with general data covariance $\mathbf{\Sigma}_x$ and anisotropic prior on the true coefficients $\mathbb{E}\mathbf{\beta}_\star\mathbf{\beta}_\star^T = \mathbf{\Sigma}_\beta$, we provide an exact characterization of the prediction risk $\mathbb{E}(y-\mathbf{x}^T\hat{\mathbf{\beta}}_\lambda)^2$ in the proportional asymptotic limit $p/n\rightarrow \gamma \in (1,\infty)$. Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting $\lambda_{\rm opt}$ for the ridge parameter $\lambda$ and confirm the implicit $\ell_2$ regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that $\lambda_{\rm opt}$ can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when both $\mathbf{X}$ and $\mathbf{\beta}_\star$ are anisotropic. Finally, we determine the optimal weighting matrix $\mathbf{\Sigma}_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\rm opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
Abstract（参考訳）: 線型モデル $\mathbf{y} = \mathbf{X} \mathbf{X} \mathbf{\beta}_\star + \mathbf{\epsilon}$ with $\mathbf{X}\in \mathbb{R}^{n\times p}$ in the overparameterized regime $p>n$ を考える。一般化された(重み付けされた)リッジ回帰で$\hat{\mathbf{\beta}}_\lambda = \left(\mathbf{x}^t\mathbf{x} + \lambda \mathbf{\sigma}_w\right)^\dagger \mathbf{x}^t\mathbf{y}$, ここで$\mathbf{\sigma}_w$ は重み行列である。一般データ共分散 $\mathbf{\sigma}_x$ と非等方性 (anisotropic before on the true coefficients $\mathbb{e}\mathbf{\beta}_\star\mathbf{\beta}_\star^t = \mathbf{\sigma}_\beta$ のランダムな設計条件の下で、比例漸近極限 $p/n\rightarrow \gamma \in (1,\infty)$ における予測リスク$\mathbb{e}(y-\mathbf{x}^t\hat{\mathbf{\beta}}_\lambda)^2$ の正確な特性を与える。私たちの一般的なセットアップは多くの興味深い発見につながります。リッジパラメータ $\lambda$ の最適設定 $\lambda_{\rm opt}$ の符号を決定する正確な条件を概説し、過剰パラメータ化の暗黙の $\ell_2$ 正規化効果を確認する。また、主成分回帰(PCR)の二重降下現象を$\mathbf{X}$と$\mathbf{\beta}_\star$の両方が異方性であるときに特徴付ける。最後に、リッジレス(\lambda\to 0$)と最適正規化(\lambda = \lambda_{\rm opt}$)の両方に対して最適な重み付け行列 $\mathbf{\Sigma}_w$ を決定し、標準リッジ回帰とPCRよりも重み付けされた目的の利点を示す。

関連論文リスト

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
我々は,モノトニック値 Omega (MVP) アルゴリズムが,差分を考慮した差分依存残差境界を$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land MathttVar_maxtextc$。
論文参考訳（メタデータ） (2025-06-06T20:33:57Z)
In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting [21.002519159190538]
我々は分散アルゴリズムを解析し、$N$クライアント上で低ランク行列の分解を計算する。グローバルな$mathbfV$ in $mathbbRd times r$をすべてのクライアントに共通とし、ローカルな$mathbfUi$ in $mathbbRn_itimes r$を得る。
論文参考訳（メタデータ） (2024-09-13T12:28:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Piecewise Linearity of Min-Norm Solution Map of a Nonconvexly Regularized Convex Sparse Model [8.586951231230596]
本稿では,各直線領域における定数空間パターン $mathbfx_star(mathbfy,da)$ について検討する。各線形ゾーンにおける $mathbfx_star(mathbfy,da)$ の閉形式式を反復的に計算する。
論文参考訳（メタデータ） (2023-11-30T10:39:47Z)
Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
本稿では,シャッフルラベルを用いた線形回帰の課題について考察する。 mathbb Rntimes m の $mathbf Y、mathbb Rntimes p の mathbf Pi、mathbb Rptimes m$ の mathbf B、mathbb Rntimes m$ の $mathbf Win mathbb Rntimes m$ である。
論文参考訳（メタデータ） (2023-10-02T16:44:47Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
非線形測定では、ほとんどの先行結果は一様ではない、すなわち、すべての$mathbfx*$に対してではなく、固定された$mathbfx*$に対して高い確率で保持される。本フレームワークはGCSに1ビット/一様量子化観測と単一インデックスモデルを標準例として適用する。また、指標集合が計量エントロピーが低い製品プロセスに対して、より厳密な境界を生み出す濃度不等式も開発する。
論文参考訳（メタデータ） (2023-09-25T17:54:19Z)
Statistically Optimal Robust Mean and Covariance Estimation for Anisotropic Gaussians [3.5788754401889014]
強い$varepsilon$-contaminationモデルでは、元のガウスサンプルのベクトルの$varepsilon$分を他のベクトルに置き換えたと仮定する。我々は、少なくとも1-デルタの確率で満足するコフラ行列 $Sigma の推定器 $widehat Sigma を構築する。
論文参考訳（メタデータ） (2023-01-21T23:28:55Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Beyond Independent Measurements: General Compressed Sensing with GNN Application [4.924126492174801]
我々は、ノイズコーン観測からmathbbRn$の構造化信号$mathbfxを復元する問題を考察する。実効的な$mathbfB$は測定値のサロゲートとして用いられる可能性がある。
論文参考訳（メタデータ） (2021-10-30T20:35:56Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
関数 $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. ここでは、$|mathbb E[R(z)] - tilde R(z)|_F を示す。
論文参考訳（メタデータ） (2021-09-06T14:21:43Z)
On the computational and statistical complexity of over-parameterized matrix sensing [30.785670369640872]
FGD法(Factorized Gradient Descend)を用いた低ランク行列検出の解法を検討する。分解行列 $mathbff$ を分離列空間に分解することにより、$|mathbff_t - mathbff_t - mathbfx*|_f2$ が統計誤差に収束することを示す。
論文参考訳（メタデータ） (2021-01-27T04:23:49Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
ガウス境界の下では、半空間とReLUを不可知的に学習する基本的な問題について検討する。我々の下限は、これらのタスクの現在の上限が本質的に最良のものであるという強い証拠を与える。
論文参考訳（メタデータ） (2020-06-29T17:10:10Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
期待される正方形損失から、最も適合した単一ニューロンを学習することの問題点を考察する。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。
論文参考訳（メタデータ） (2020-05-29T07:20:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。