Fugu-MT 論文翻訳(概要): Poly-time universality and limitations of deep learning

論文の概要: Poly-time universality and limitations of deep learning

arxiv url: http://arxiv.org/abs/2001.02992v1
Date: Tue, 7 Jan 2020 08:31:50 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-13 20:34:03.519520
Title: Poly-time universality and limitations of deep learning
Title（参考訳）: 深層学習の多時間普遍性と限界
Authors: Emmanuel Abbe and Colin Sandon
Abstract要約: 本研究の目的は,ディープラーニングが多時間で学べる,あるいは学べない関数分布を特徴付けることである。 SGDに基づくディープラーニングの結果が証明され、GDに基づくディープラーニングの非ユニバーサリティ結果が証明される。
参考スコア（独自算出の注目度）: 23.150465075637698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) {\it Deep learning with SGD is efficiently universal.} Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) {\it Result (1) does not hold for GD:} Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial {\it cross-predictability,} where the cross-predictability gives a measure of ``average'' function correlation -- relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree $k$ if and only if $k$ is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities).
Abstract（参考訳）: 本研究の目的は,多時間で学習できる,あるいはできない関数分布を特徴付けることである。 SGDに基づく深層学習には普遍性があり、GDに基づく深層学習には非ユニバーシティ結果が証明され、また、SGDに基づく深層学習と統計的クエリアルゴリズムを分離する: (1) SGDによる深部学習は効率よく普遍的である。 } ポリタイムでサンプルから学習できる関数分布は、ポリステップ、ポリレート、おそらくポリノイズによるポリタイム初期化において、SGDで訓練されたポリサイズニューラルネットワークによって学習することができる。したがって、深層学習は普遍的な学習パラダイムを提供する: 近似と推定誤差はNPハードのERMを用いて多次元ニューラルネットワークで制御できることが知られている; この新たな結果は、最適化誤差も多時間でSGDで制御できることを示している。 The picture changes for GD with large enough batches: (2) {\it Result (1) does not hold for GD:} Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial {\it cross-predictability,} where the cross-predictability gives a measure of ``average'' function correlation -relations and distinctions to the statistical dimension are discussed. 特に、これらの制約を持つ GD は、$k$ が定数である場合に限り、$k$ の単項を効率的に学習することができる。したがって、(1) と (2) は興味深いコントラストを指している: SGD はある種のポリノイズでも普遍的であるが、完全な GD や SQ のアルゴリズムは(パリティのように)そうでない。

関連論文リスト

STRIDE: Sparse Techniques for Regression in Deep Gaussian Processes [0.3277163122167433]
大規模データを用いた深部GPトレーニングのための粒子ベース予測予測トレーニング手法を開発した。標準ベンチマーク問題に対して,本手法を検証した。
論文参考訳（メタデータ） (2025-05-16T15:18:15Z)
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics [21.55547541297847]
平均場ランゲヴィンアルゴリズムを用いて学習した2層ニューラルネットワークを用いて,高次元のマルチインデックスモデルを学習する問題について検討する。軽度の分布仮定の下では、サンプルと計算の複雑さの両方を制御する実効次元 $d_mathrmeff$ を特徴づける。
論文参考訳（メタデータ） (2024-08-14T02:13:35Z)
Random features and polynomial rules [0.0]
本稿では,ガウスデータを用いた一般教師付き学習問題に対するランダム特徴モデルの性能の一般化について述べる。我々は、$Dto infty$と$P/DK$,$N/DL$の間の少なくとも一方が有限である極限から遠く離れた良い合意を見出す。
論文参考訳（メタデータ） (2024-02-15T18:09:41Z)
Domain Invariant Learning for Gaussian Processes and Bayesian Exploration [39.83530605880014]
そこで本研究では,確率を最小限に最適化したガウス過程(DIL-GP)の領域不変学習アルゴリズムを提案する。数値実験により、複数の合成および実世界のデータセットの予測におけるDIL-GPの優位性を示す。
論文参考訳（メタデータ） (2023-12-18T16:13:34Z)
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics [22.365592070563764]
等方性データを用いた完全連結ニューラルネットワークにおけるSGD学習の時間的複雑さについて検討する。我々は、"階層的"なターゲット関数がどのようにあるかを測定する複雑さの尺度(跳躍)を提唱した。トレーニングは,サドル・トゥ・サドル・ダイナミックで関数サポートを逐次学習することを示す。
論文参考訳（メタデータ） (2023-02-21T23:16:23Z)
Differentially-Private Bayes Consistency [70.92545332158217]
差分プライバシー(DP)を満たすベイズ一貫した学習ルールを構築する。ほぼ最適なサンプル複雑性を持つ半教師付き環境で,任意のVCクラスをプライベートに学習できることを実証する。
論文参考訳（メタデータ） (2022-12-08T11:57:30Z)
Scaling Structured Inference with Randomization [64.18063627155128]
本稿では、構造化されたモデルを数万の潜在状態に拡張するためにランダム化された動的プログラミング(RDP)のファミリを提案する。我々の手法は古典的DPベースの推論に広く適用できる。また、自動微分とも互換性があり、ニューラルネットワークとシームレスに統合できる。
論文参考訳（メタデータ） (2021-12-07T11:26:41Z)
Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
乱変数ベクトルの各成分上で動作し,パラメータを全て共有する可逆なODEベースのマッピングを提案する。 NGGPは、様々なベンチマークとアプリケーションに対する競合する最先端のアプローチよりも優れています。
論文参考訳（メタデータ） (2021-10-26T10:45:25Z)
On the Power of Differentiable Learning versus PAC and SQ Learning [46.161371858592666]
本研究では,ミニバッチ勾配降下(SGD)による人口減少の学習力と,実証的損失のバッチ・グラディエント・Descent(GD)について検討した。 SGDとGDは、常に統計的クエリ(SQ)で学習をシミュレートできるが、それを超える能力は勾配計算の精度$rho$に依存する。
論文参考訳（メタデータ） (2021-08-09T17:25:06Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
強化学習のコアにおける探索・探索トレードオフについて検討する。特に、関数クラス $mathcalF$ の複雑さが関数の複雑さを特徴づけていることを証明する。私たちの後悔の限界はエピソードの数とは無関係です。
論文参考訳（メタデータ） (2020-11-09T18:32:22Z)
SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
SURFは分布を断片的に近似するアルゴリズムである。実験では最先端のアルゴリズムよりも優れています。
論文参考訳（メタデータ） (2020-02-22T01:03:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。