Fugu-MT 論文翻訳(概要): A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

論文の概要: A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

arxiv url: http://arxiv.org/abs/2510.19382v1
Date: Wed, 22 Oct 2025 08:55:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:15.438867
Title: A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
Title（参考訳）: 構造発見のための非ランダム化フレームワーク:ニューラルネットワークなどへの応用
Authors: Nikos Tsikouras, Yorgos Pantis, Ioannis Mitliagkas, Christos Tzamos,
Abstract要約: 構造発見の側面に注目し、より弱い仮定の下で研究する。私たちのアプローチの中核は、キー$textitderandomization$ lemmaです。この補題は構造発見を直接説明し、他の領域で直ちに適用することができる。
参考スコア（独自算出の注目度）: 25.592330047318274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g.\ perturbed gradient descent (PGD). At the core of our approach is a key $\textit{derandomization}$ lemma, which states that optimizing the function $\mathbb{E}_{\mathbf{x}} \left[g_{\theta}(\mathbf{W}\mathbf{x} + \mathbf{b})\right]$ converges to a point where $\mathbf{W} = \mathbf{0}$, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.
Abstract（参考訳）: ニューラルネットワーク(NN)における機能学習のダイナミクスを理解することは、依然として大きな課題である。 The work of (Mousavi-Hosseini et al , 2023) analysiss a multiple index teacher-student set and shows that a two-layer students at a low-rank structure in its first-layer weights when training with stochastic gradient descent (SGD) and a strong regularizer。この構造特性は、一般化のサンプルの複雑さを減らすことが知られている。実際、2番目のステップでは、同じ著者が追加の仮定の下でアルゴリズム固有の学習保証を確立する。本稿では,構造発見の側面にのみ焦点をあて,より弱い仮定の下で研究する。 (a)任意の大きさと深さのNN。 (b)全てのパラメータをトレーニングできる。 (c) どんなスムーズな損失関数の下でも (d)小さな正規化、そして (e) 2階定常点(SOSP)、e g \摂動勾配降下(PGD)を達成する方法によって訓練された。私たちのアプローチの中核は、キー $\textit{derandomization}$ lemma であり、これは函数 $\mathbb{E}_{\mathbf{x}} \left[g_{\theta}(\mathbf{W}\mathbf{x} + \mathbf{b})\right]$ が、穏やかな条件下で $\mathbf{W} = \mathbf{0}$ となる点に収束するというものである。この補題の基本的な性質は構造発見を直接説明し、MAXCUTのエンドツーエンド近似やJohnson-Lindenstrauss埋め込みの計算など他の領域に即時適用することができる。

論文の概要: A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

関連論文リスト