Fugu-MT 論文翻訳(概要): Backward Feature Correction: How Deep Learning Performs Deep Learning

論文の概要: Backward Feature Correction: How Deep Learning Performs Deep Learning

arxiv url: http://arxiv.org/abs/2001.04413v5
Date: Sat, 13 Mar 2021 12:05:09 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-11 22:23:26.380278
Title: Backward Feature Correction: How Deep Learning Performs Deep Learning
Title（参考訳）: 後方的特徴補正:深層学習がどのように深層学習を行うか
Authors: Zeyuan Allen-Zhu and Yuanzhi Li
Abstract要約: 特定の階層的学習タスクにおいて、SGDを用いて、ディープニューラルネットワークが標本化され、時間効率がよいことを示す。我々は、ネットワーク内の上位層をトレーニングすることで、下位層の特徴を改善する"後方特徴補正"と呼ばれる新しい原則を確立する。
参考スコア（独自算出の注目度）: 66.05472746340142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of Hierarchical Learning. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. We formally analyze how multi-layer neural networks can perform such hierarchical learning efficiently and automatically by applying SGD. On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how deep neural networks can still be sample and time efficient using SGD on certain hierarchical learning tasks, when NO KNOWN existing algorithm is efficient. We establish a new principle called "backward feature correction", where training higher-level layers in the network can improve the features of lower-level ones. We believe this is the key to understand the deep learning process in multi-layer neural networks. On the technical side, we show for regression and even binary classification, for every input dimension $d>0$, there is a concept class of degree $\omega(1)$ polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any function from this class in $\mathsf{poly}(d)$ time and sample complexity to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions. In contrast, we do not know any other simple algorithm (including layer-wise training or applying kernel method sequentially) that can learn this concept class in $\mathsf{poly}(d)$ time even to any $d^{-0.01}$ error. As a side result, we prove $d^{\omega(1)}$ lower bounds for several non-hierarchical learners, including any kernel methods, neural tangent or neural compositional kernels.
Abstract（参考訳）: 110層resnetは、比較的少ないトレーニング例と短いトレーニング時間を使って、複雑な分類器をどのように学んでいるのか? 我々はこれを階層的学習の観点から説明する理論を提示する。本稿では,学習者が複雑な対象関数を単純な関数列に分解することで,サンプルや時間的複雑さを減らし,複雑な対象関数を表現できることを,階層学習と呼ぶ。我々は,SGDを適用して,階層的学習を効率的に,かつ自動的に行うことができる多層ニューラルネットワークを正式に分析する。提案する概念的側面では,従来のNOKNOWNアルゴリズムが効率的である場合,特定の階層的学習タスクにおいて,SGDを用いた深層ニューラルネットワークのサンプリングと時間効率を示すFIRST理論結果について述べる。我々は、ネットワーク内の上位層をトレーニングすることで、下位層の特徴を改善する"後方特徴補正"と呼ばれる新しい原則を確立する。これは、多層ニューラルネットワークのディープラーニングプロセスを理解するための鍵だと考えています。技術的な面では、任意の入力次元 $d>0$ に対して、次数 $\omega(1)$ 多項式の概念クラスがあり、$\omega(1)$-layer neural networks を学習者として使うと、sgd はこのクラスから任意の関数を$\mathsf{poly}(d)$ で学習でき、任意の$\frac{1}{\mathsf{poly}(d)}$ error を学習することで二次関数の$\omega(1)$ 層の合成として表現できる。対照的に、この概念クラスを任意の$d^{-0.01}$エラーに対しても$\mathsf{poly}(d)$ timeで学習できる(階層的なトレーニングやカーネルメソッドの逐次適用を含む)他の単純なアルゴリズムを知らない。副次的な結果として,ニューラルネットワークやニューラルコンポジションカーネルを含む複数の非階層的学習者に対して,$d^{\omega(1)}$ローバウンドを証明した。

関連論文リスト

Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks [46.190882811878744]
ディープラーニング理論では、ニューラルネットワークが階層的特徴をどのように学習するかを理解することが重要な問題である。本研究では,3層ニューラルネットワークを用いたテキストマルチプル非線形特徴の階層的学習について検討する。
論文参考訳（メタデータ） (2024-11-26T08:14:48Z)
An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks [11.925232472331494]
一般の$L$層ニューラルネットワークを用いて, ニューラルTD法の非漸近解析を改良した。新しい証明技術が開発され、新しい$tildemathcalO(epsilon-1)$サンプルの複雑さが引き出された。
論文参考訳（メタデータ） (2024-05-07T05:29:55Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
3層ニューラルネットワークを用いた標準ガウス分布における階層関数の学習問題について検討する。次数$k$s$p$の大規模なサブクラスの場合、正方形損失における階層的勾配によるトレーニングを受けた3層ニューラルネットワークは、テストエラーを消すためにターゲット$h$を学習する。この研究は、3層ニューラルネットワークが複雑な特徴を学習し、その結果、幅広い階層関数のクラスを学ぶ能力を示す。
論文参考訳（メタデータ） (2023-11-23T02:19:32Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
正方形損失に関して、標準的なガウス分布の下での$k$ReLU活性化の線形結合をPAC学習する問題をmathbbRd$で検討する。本研究の主な成果は,この学習課題に対して,サンプルおよび計算複雑性が$(dk/epsilon)O(k)$で,epsilon>0$が目標精度である。
論文参考訳（メタデータ） (2023-07-24T14:37:22Z)
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time [23.380148043514215]
本研究では、2層ニューラルネットワークの特徴がターゲット関数の構造にどのように適応するかを理論的に検討する。バッチサイズが$n = MathcalO(d)$で、階段の性質を満たす複数の目標方向を学習するのに十分であることを示す。
論文参考訳（メタデータ） (2023-05-29T17:43:44Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
本稿では、強化学習(RL)における深部神経機能近似の理論的研究について述べる。我々は、Besov(およびBarron)関数空間によって与えられるディープ(および2層)ニューラルネットワークによる$epsilon$-greedy探索により、バリューベースのアルゴリズムに焦点を当てる。我々の解析は、ある平均測度$mu$の上の$L2(mathrmdmu)$-integrable空間における時間差誤差を再構成し、非イド設定の下で一般化問題に変換する。
論文参考訳（メタデータ） (2022-09-15T15:42:47Z)
Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
ディープラーニングには膨大な計算とエネルギーのコストが伴う。探索木の小さな部分集合として、二分ニューラルネットワークの新しいサブセットを示し、それぞれが探索木のサブセット(Ds)に対応する。我々はこの見解が深層ネットワーク(Ds)の分析解析にさらに応用できると考えている。
論文参考訳（メタデータ） (2022-08-09T02:29:42Z)
Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
特定の状況下では、勾配降下によって訓練されたニューラルネットワークは、カーネルメソッドのように振る舞う。実際には、ニューラルネットワークが関連するカーネルを強く上回ることが知られている。
論文参考訳（メタデータ） (2022-06-30T09:24:02Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
過度なパラメータ化対象の勾配勾配は遅延学習体制を超え、データ中の特定の低ランク構造を利用する可能性があることを示す。以上の結果から,過パラメータ化対象の勾配勾配は遅延学習体制を超え,データ中の特定の低ランク構造を利用する可能性が示唆された。
論文参考訳（メタデータ） (2020-10-22T00:32:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。