Fugu-MT 論文翻訳(概要): Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

論文の概要: Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

arxiv url: http://arxiv.org/abs/2206.08918v1
Date: Fri, 17 Jun 2022 17:55:43 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-20 13:32:39.780925
Title: Learning a Single Neuron with Adversarial Label Noise via Gradient Descent
Title（参考訳）: 逆行性ラベルノイズを伴う単一ニューロンのグラディエントDescentによる学習
Authors: Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
Abstract要約: モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
参考スコア（独自算出の注目度）: 50.659479930171585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the fundamental problem of learning a single neuron, i.e., a function of the form $\mathbf{x}\mapsto\sigma(\mathbf{w}\cdot\mathbf{x})$ for monotone activations $\sigma:\mathbb{R}\mapsto\mathbb{R}$, with respect to the $L_2^2$-loss in the presence of adversarial label noise. Specifically, we are given labeled examples from a distribution $D$ on $(\mathbf{x}, y)\in\mathbb{R}^d \times \mathbb{R}$ such that there exists $\mathbf{w}^\ast\in\mathbb{R}^d$ achieving $F(\mathbf{w}^\ast)=\epsilon$, where $F(\mathbf{w})=\mathbf{E}_{(\mathbf{x},y)\sim D}[(\sigma(\mathbf{w}\cdot \mathbf{x})-y)^2]$. The goal of the learner is to output a hypothesis vector $\mathbf{w}$ such that $F(\mathbb{w})=C\, \epsilon$ with high probability, where $C>1$ is a universal constant. As our main contribution, we give efficient constant-factor approximate learners for a broad class of distributions (including log-concave distributions) and activation functions. Concretely, for the class of isotropic log-concave distributions, we obtain the following important corollaries: For the logistic activation, we obtain the first polynomial-time constant factor approximation (even under the Gaussian distribution). Our algorithm has sample complexity $\widetilde{O}(d/\epsilon)$, which is tight within polylogarithmic factors. For the ReLU activation, we give an efficient algorithm with sample complexity $\tilde{O}(d\, \polylog(1/\epsilon))$. Prior to our work, the best known constant-factor approximate learner had sample complexity $\tilde{\Omega}(d/\epsilon)$. In both of these settings, our algorithms are simple, performing gradient-descent on the (regularized) $L_2^2$-loss. The correctness of our algorithms relies on novel structural results that we establish, showing that (essentially all) stationary points of the underlying non-convex loss are approximately optimal.
Abstract（参考訳）: 単一ニューロンを学習する基本的な問題、すなわち、逆ラベルノイズの存在下での$L_2^2$-lossに関して、モノトン活性化に対する $\mathbf{x}\mapsto\sigma(\mathbf{w}\cdot\mathbf{x})$ という形の関数について研究する。具体的には、$(\mathbf{x}, y)\in\mathbb{R}^d \times \mathbb{R}$ が存在するような分布 $D$ on $(\mathbf{x}, y)\in\mathbb{R}^d \times \mathbb{R}$ が存在し、$F(\mathbf{w}^\ast)=\epsilon$, ここで$F(\mathbf{w})=\mathbf{E}_{(\mathbf{x},y)\sim D}[(\sigma(\mathbf{w}\cdot \mathbf{x})-y)^2]$ が与えられる。学習者の目標は、高い確率で$f(\mathbb{w})=c\, \epsilon$となるような仮説ベクトル$\mathbf{w}$を出力することである。本研究の主な貢献として,多種多様な分布(対数凹分布を含む)とアクティベーション関数に対して,効率的な定数近似学習者を与える。具体的には、等方性対数凸分布のクラスに対して、次の重要なコロールを得る: ロジスティック活性化のために、(ガウス分布の下でも)最初の多項式時間定数係数近似を得る。我々のアルゴリズムは、多対数因子の中で厳密なサンプル複雑性$\widetilde{O}(d/\epsilon)$を持つ。 ReLU 活性化のために、サンプル複雑性 $\tilde{O}(d\, \polylog(1/\epsilon))$ の効率的なアルゴリズムを与える。我々の研究に先立ち、最もよく知られている定数要素近似学習者はサンプル複雑性$\tilde{\Omega}(d/\epsilon)$であった。どちらの設定でも、我々のアルゴリズムは単純で、(正規化)$L_2^2$-lossで勾配差を発生させる。アルゴリズムの正しさは、確立した新しい構造的結果に依存し、基礎となる非凸損失の定常点がほぼ最適であることを示す。

関連論文リスト

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
本研究では, 対向分布シフトの存在下でのL2$損失に対して, 単一ニューロンを学習する問題について検討した。ベクトルベクトル二乗損失を$chi2$divergenceから$mathcalp_0$に近似するアルゴリズムを開発した。
論文参考訳（メタデータ） (2024-11-11T03:43:52Z)
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
シングルインデックスモデル (SIM) は $sigma(mathbfwast cdot mathbfx)$ という形式の関数であり、$sigma: mathbbR to mathbbR$ は既知のリンク関数であり、$mathbfwast$ は隠れ単位ベクトルである。適切な学習者が$L2$-error of $O(mathrmOPT)+epsilon$。
論文参考訳（メタデータ） (2024-11-08T17:10:38Z)
Near-Optimal and Tractable Estimation under Shift-Invariance [0.21756081703275998]
そのような信号のクラスは、非常にリッチである:$mathbbCn$ 上のすべての指数振動を含み、合計$s$ である。このクラスの統計複雑性は、$(delta)$-confidence $ell$-ballの半径2乗最小マックス周波数によって測定されるが、$s$-sparse信号のクラス、すなわち$Oleft(slog(en) + log(delta-1)right) cdot log(en/s)とほぼ同じであることを示す。
論文参考訳（メタデータ） (2024-11-05T18:11:23Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
単一インデックス対象関数 $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ の勾配勾配勾配学習問題について検討する。 SGDに基づくアルゴリズムにより最適化された2層ニューラルネットワークは、情報指数に支配されない複雑さで$f_*$を学習する。
論文参考訳（メタデータ） (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
非線形測定では、ほとんどの先行結果は一様ではない、すなわち、すべての$mathbfx*$に対してではなく、固定された$mathbfx*$に対して高い確率で保持される。本フレームワークはGCSに1ビット/一様量子化観測と単一インデックスモデルを標準例として適用する。また、指標集合が計量エントロピーが低い製品プロセスに対して、より厳密な境界を生み出す濃度不等式も開発する。
論文参考訳（メタデータ） (2023-09-25T17:54:19Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
正方形損失に関して、標準的なガウス分布の下での$k$ReLU活性化の線形結合をPAC学習する問題をmathbbRd$で検討する。本研究の主な成果は,この学習課題に対して,サンプルおよび計算複雑性が$(dk/epsilon)O(k)$で,epsilon>0$が目標精度である。
論文参考訳（メタデータ） (2023-07-24T14:37:22Z)
An Over-parameterized Exponential Regression [18.57735939471469]
LLM(Large Language Models)の分野での最近の発展は、指数的アクティベーション関数の使用への関心を喚起している。ニューラル関数 $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRdd
論文参考訳（メタデータ） (2023-03-29T07:29:07Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
ガウス境界の下でのマスアートノイズ付きmathbbRd$におけるPAC学習ハーフスペースの問題について検討する。この結果は,Massartモデルにおける学習ハーフスペースの複雑さを定性的に特徴づけるものである。
論文参考訳（メタデータ） (2021-08-19T16:16:48Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
期待される正方形損失から、最も適合した単一ニューロンを学習することの問題点を考察する。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。
論文参考訳（メタデータ） (2020-05-29T07:20:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。