Fugu-MT 論文翻訳(概要): Geometric compression of invariant manifolds in neural nets

論文の概要: Geometric compression of invariant manifolds in neural nets

arxiv url: http://arxiv.org/abs/2007.11471v4
Date: Thu, 11 Mar 2021 08:58:04 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-07 22:11:27.609994
Title: Geometric compression of invariant manifolds in neural nets
Title（参考訳）: ニューラルネットワークにおける不変多様体の幾何学的圧縮
Authors: Jonas Paccolat, Leonardo Petrini, Mario Geiger, Kevin Tyloo and Matthieu Wyart
Abstract要約: ニューラルネットワークは、データが$d$次元にあるモデルにおいて、不定形入力空間をいかに圧縮するかを研究する。勾配勾配勾配で訓練された一重層FCネットワークの場合、第一重みの層は、$d_perp=d-d_parallel$非形式的方向に対してほとんど無関心になる。次に、圧縮がニューラルカーネル(NTK)の進化を経時的に形作っていることを示し、その最上位の固有ベクトルがより情報的になり、ラベルにより大きな投影を表示する。
参考スコア（独自算出の注目度）: 2.461575510055098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insensitive to the $d_\perp=d-d_\parallel$ uninformative directions. These are effectively compressed by a factor $\lambda\sim \sqrt{p}$, where $p$ is the size of the training set. We quantify the benefit of such a compression on the test error $\epsilon$. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that $\epsilon \sim p^{-\beta}$, with $\beta_\text{Lazy} = d / (3d-2)$. Compression improves the learning curves so that $\beta_\text{Feature} = (2d-1)/(3d-2)$ if $d_\parallel = 1$ and $\beta_\text{Feature} = (d + d_\perp/2)/(3d-2)$ if $d_\parallel > 1$. We test these predictions for a stripe model where boundaries are parallel interfaces ($d_\parallel=1$) as well as for a cylindrical boundary ($d_\parallel=2$). Next we show that compression shapes the Neural Tangent Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden layer FC network trained on the stripe model and for a 16-layers CNN trained on MNIST, for which we also find $\beta_\text{Feature}>\beta_\text{Lazy}$.
Abstract（参考訳）: ニューラルネットワークは、データが$d$次元にあるが、そのラベルは次元$d_\parallel < d$の線型多様体内でのみ変化するモデルにおいて、不定形入力空間をいかに圧縮するかを研究する。勾配降下を訓練した無限小重み(すなわち特徴学習環境において)で初期化される1つの隠れ層ネットワークでは、最初の重みの層は$d_\perp=d-d_\parallel$非奇数方向に対してほぼ無感に発展する。これらは実効的に$\lambda\sim \sqrt{p}$で圧縮され、$p$はトレーニングセットのサイズである。このような圧縮の利点をテストエラー$\epsilon$で定量化する。重み(遅延トレーニングレジーム)の大規模な初期化では、圧縮は起こらず、ラベルを分離する通常の境界では$\epsilon \sim p^{-\beta}$、$\beta_\text{lazy} = d / (3d-2)$となる。圧縮は学習曲線を改善し、$\beta_\text{Feature} = (2d-1)/(3d-2)$ if $d_\parallel = 1$ and $\beta_\text{Feature} = (d + d_\perp/2)/(3d-2)$ if $d_\parallel > 1$ となる。これらの予測を、境界が平行なインターフェース(d_\parallel=1$)であるstripeモデルと円筒境界(d_\parallel=2$)でテストする。次に、圧縮がニューラル・タンジェント・カーネル(NTK)の進化を時相的に形作り、その最上位の固有ベクトルがより情報的になり、ラベルにより大きな投影を表示することを示す。従って、トレーニング終了時の凍結ntkによるカーネル学習は、初期ntkよりも優れる。これらの予測は、stripeモデルでトレーニングされた単層fcネットワークとmnistでトレーニングされた16層cnnの両方で確認し、$\beta_\text{feature}>\beta_\text{lazy}$を求める。

関連論文リスト

Contextual Bandit Optimization with Pre-Trained Neural Networks [0.0]
より小さなモデルの体制において、事前学習がいかに役立つかを検討する。最後の層の次元と作用数$K$が水平線$T$よりもはるかに小さいとき、E2TCのサブ線形後悔を示す。弱い訓練体制では、最後の層のみが学習されると、問題は不特定な線形バンディットへと減少する。
論文参考訳（メタデータ） (2025-01-09T10:21:19Z)
Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
我々は,完全連結ニューラルネットワークによるベイズ推定が解けることを示す物理レベルの厳密さを示す。我々はモデルエビデンスを計算し、任意の温度で1/N$で任意の順序に後続する手法を提供する。
論文参考訳（メタデータ） (2024-05-26T17:08:04Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
3層ニューラルネットワークを用いた標準ガウス分布における階層関数の学習問題について検討する。次数$k$s$p$の大規模なサブクラスの場合、正方形損失における階層的勾配によるトレーニングを受けた3層ニューラルネットワークは、テストエラーを消すためにターゲット$h$を学習する。この研究は、3層ニューラルネットワークが複雑な特徴を学習し、その結果、幅広い階層関数のクラスを学ぶ能力を示す。
論文参考訳（メタデータ） (2023-11-23T02:19:32Z)
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD [22.703825902761405]
SGDで訓練されたReLU NNは、主方向を回復することで、$y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$という形の単一インデックスターゲットを学習できることを示す。また、SGDによる近似低ランク構造を用いて、NNに対して圧縮保証を提供する。
論文参考訳（メタデータ） (2022-09-29T15:29:10Z)
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation [89.21686761957383]
2層ネットワークにおける第1層パラメータ $boldsymbolW$ の勾配降下ステップについて検討した。我々の結果は、一つのステップでもランダムな特徴に対してかなりの優位性が得られることを示した。
論文参考訳（メタデータ） (2022-05-03T12:09:59Z)
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios [69.2027612631023]
学習曲線指数$beta$を決定する上で,局所性が重要であることを示す。我々は、自然の仮定を用いて、トレーニングセットのサイズに応じて減少するリッジでカーネルレグレッションを実行すると、リッジレスの場合と同じような学習曲線指数が得られることを証明して結論付けた。
論文参考訳（メタデータ） (2021-06-16T08:27:31Z)
An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks [40.489350374378645]
我々は$widetildemathcalO(e1/delta2+sqrtn)$ニューロンと$widetildemathcalO(fracddelta+n)$ウェイトが十分であることを証明した。また、超平面を用いて球面上の$n$の点を分離する純粋に幾何学的な問題にニューラルネットワークを接続することで、新しい下界を証明した。
論文参考訳（メタデータ） (2021-06-14T19:42:32Z)
On the emergence of tetrahedral symmetry in the final and penultimate layers of neural network classifiers [9.975163460952045]
分類器の最終的な出力である$h$ であっても、$h$ が浅いネットワークである場合、$c_i$ のクラスからのデータサンプルは均一ではない。本研究は,高表現性深層ニューラルネットワークの玩具モデルにおいて,この観察を解析的に説明する。
論文参考訳（メタデータ） (2020-12-10T02:32:52Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
過度なパラメータ化対象の勾配勾配は遅延学習体制を超え、データ中の特定の低ランク構造を利用する可能性があることを示す。以上の結果から,過パラメータ化対象の勾配勾配は遅延学習体制を超え,データ中の特定の低ランク構造を利用する可能性が示唆された。
論文参考訳（メタデータ） (2020-10-22T00:32:12Z)
Deep Learning Meets Projective Clustering [66.726500395069]
NLPネットワークを圧縮するための一般的なアプローチは、埋め込み層を行列 $AinmathbbRntimes d$ としてエンコードすることである。計算幾何学から遠射的クラスタリングに着想を得て、この部分空間を$k$部分空間の集合で置き換えることを提案する。
論文参考訳（メタデータ） (2020-10-08T22:47:48Z)
How isotropic kernels perform on simple invariants [0.5729426778193397]
等方性カーネル手法のトレーニング曲線は、学習すべきタスクの対称性に依存するかを検討する。大規模な帯域幅では、$beta = fracd-1+xi3d-3+xi$, where $xiin (0,2)$ がカーネルのストライプを原点とする指数であることを示す。
論文参考訳（メタデータ） (2020-06-17T09:59:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。