Fugu-MT 論文翻訳(概要): When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks

論文の概要: When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks

arxiv url: http://arxiv.org/abs/2606.04476v1
Date: Wed, 03 Jun 2026 05:44:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.571519
Title: When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks
Title（参考訳）: 両方の層が学ぶとき:ReLUネットワークによる線形モデル表現のトレーニングダイナミクス
Authors: Berk Tinaz, Changzhi Xie, Mahdi Soltanolkotabi,
Abstract要約: 線形対象関数に適合する1層ReLUネットワークの両層を協調的にトレーニングするための勾配勾配ダイナミクスについて検討した。本分析では, 隠れ重みが植え付け方向と漸進的に一致し, 出力重みが正しい符号パターンを維持できるアライメントフェーズを3つのフェーズで追跡する。我々は,全軌道に沿って保持される新しい一様濃度結果を確立し,次々に最適な試料の複雑性を得るのに不可欠である。
参考スコア（独自算出の注目度）: 22.81761236732655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the gradient descent dynamics for jointly training both layers of a one-hidden-layer ReLU network to fit a linear target function. Concretely, we consider a realizable setting where inputs are drawn i.i.d. from a Gaussian distribution and labels follow a planted linear model. This stylized framework captures salient features of end-to-end training in inverse problems and certain auto-encoder models. Despite its apparent simplicity, the dynamics remain poorly understood, in part because the loss landscape contains multiple non-strict saddle points, making it unclear why gradient descent from random initialization reliably escapes bad stationary regions. We provide a detailed characterization of the optimization landscape and prove that gradient descent from a moderately small random initialization-simultaneously training both layers-converges to a global minimizer at a linear rate with order-wise optimal sample complexity. Our analysis tracks the trajectory through three phases: an alignment phase in which hidden weights progressively align with the planted direction while the output weights maintain the correct sign pattern; a growth phase in which the norms of both layers increase while preserving alignment; and a local refinement phase in which the aligned neurons rapidly converge to the planted direction, yielding fast local convergence. To rigorously show that GD avoids non-strict saddles, we develop trajectory-level control arguments for the end-to-end dynamics. In addition, we establish novel uniform concentration results that hold along the entire trajectory, and are essential for obtaining order-wise optimal sample complexity. We corroborate our theory with extensive experiments across a range of configurations.
Abstract（参考訳）: 本稿では,線形対象関数に適合する一層ReLUネットワークの両層を協調訓練するための勾配勾配ダイナミクスについて検討する。具体的には,ガウス分布から入力が抽出され,ラベルが植込み線形モデルに従うような,実現可能な環境を考える。このスタイリングフレームワークは、逆問題と特定のオートエンコーダモデルにおけるエンドツーエンドトレーニングの健全な特徴をキャプチャする。明らかな単純さにもかかわらず、損失ランドスケープには複数の非制限サドルポイントが含まれており、なぜランダム初期化からの勾配降下が不規則な定常領域を確実に逃がすのかは不明である。最適化ランドスケープの詳細な特徴解析を行い、適度に小さなランダム初期化から勾配勾配の勾配勾配が、オーダーワイドの最適なサンプル複雑性を持つ線形速度で、両層を同時に、大域最小化器に学習することを証明する。本分析では, 隠れ重みが植え込み方向と漸進的に整合するアライメントフェーズと, 出力重みが正しい手形パターンを維持しているアライメントフェーズ, 両階層のノルムが上昇するアライメントフェーズと, アライメントニューロンが植え込み方向に急速に収束し, 高速な局所収束をもたらす局所微細化フェーズの3段階を追尾する。 GDが非制限サドルを避けることを厳密に示すために、我々は、終端から終端のダイナミクスに対する軌道レベルの制御引数を開発する。さらに, 軌道全体に沿って保持される新しい一様濃度計算結果を確立し, オーダーワイド最適試料の複雑性を得るのに不可欠である。我々は、様々な構成にわたる広範な実験で理論を裏付ける。

論文の概要: When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks

関連論文リスト