Fugu-MT 論文翻訳(概要): Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

論文の概要: Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

arxiv url: http://arxiv.org/abs/2311.15390v1
Date: Sun, 26 Nov 2023 19:19:02 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-28 17:55:56.194241
Title: Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression
Title（参考訳）: 2層非線形回帰に対する近似ニュートン法の局所収束
Authors: Zhihang Li, Zhao Song, Zifan Wang, Junze Yin
Abstract要約: 2層回帰問題は先行研究でよく研究されている。第1の層はReLUユニットで活性化され、第2の層はソフトマックスユニットで活性化される。ヘッセン行列の損失関数は正定値であり、ある仮定の下でリプシッツが連続であることを証明する。
参考スコア（独自算出の注目度）: 21.849997443967705
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after $O(\log(1/\epsilon))$ iterations, our algorithm can find an $\epsilon$-approximate minimizer of the training loss with high probability. Each iteration requires approximately $O(\mathrm{nnz}(C) + d^\omega)$ time, where $d$ is the model size, $C$ is the input matrix, and $\omega < 2.374$ is the matrix multiplication exponent.
Abstract（参考訳）: 日常生活の様々な側面において,大規模言語モデル(LLM)による顕著な進歩があった。 LLMは自然言語処理における変換力として機能し、テキスト生成、翻訳、感情分析、質問応答の応用を見つける。 llmの成果は、この分野における研究努力の大幅な増加につながった。 1つの特定の2層回帰問題は、前回の作業においてよく研究されており、第1の層はreluユニットによって活性化され、第2の層はsoftmaxユニットによって活性化される。以前の研究は2層回帰を構築するための堅固な分析を提供するが、2層以上の回帰問題を構成する分析には依然としてギャップがある。本稿では,この問題に対処するための重要なステップとして,二層回帰問題の解析を行う。以前の作業とは対照的に、最初のレイヤはsoftmaxユニットによってアクティベートされます。これにより、softmax関数に基づいてより多くのアクティベーション関数を作成するための将来の分析のステージが設定される。ソフトマックス関数の再配置は、大きく異なる分析をもたらす。その結果, 正規化トレーニング損失を最小化するために用いられる近似ニュートン法の収束特性を解析した。ヘッセン行列の損失関数は正定値であり、ある仮定の下でリプシッツが連続であることを証明する。これにより,提案アルゴリズムの局所収束保証を確立することができる。具体的には、適切な初期化と$O(\log(1/\epsilon)$反復の後、高い確率でトレーニング損失を最小化する$\epsilon$-approximateを見つけることができる。それぞれの繰り返しはおよそ$O(\mathrm{nnz}(C) + d^\omega)$timeを必要とし、$d$はモデルのサイズ、$C$は入力行列、$\omega < 2.374$は行列乗算指数である。

関連論文リスト

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
本稿では,円滑なベルマン作用素を持つ連続空間マルコフ決定過程(MDP)の一般クラスにおいて,$varepsilon$-optimal Policyを学習する問題を考察する。我々のソリューションの鍵となるのは、調和解析のアイデアに基づく新しい射影技術である。我々の結果は、連続空間 MDP における2つの人気と矛盾する視点のギャップを埋めるものである。
論文参考訳（メタデータ） (2024-05-10T09:58:47Z)
How to Inverting the Leverage Score Distribution? [16.744561210470632]
ツールとして広く利用されているレバレッジスコアにもかかわらず、本論文では、新しい問題、すなわち反転レバレッジスコアについて検討する。我々は、ニュートン法における大域収束率を確保するために反復縮小と帰納仮説を用いる。この統計レバレッジの反転に関する重要な研究は、解釈、データリカバリ、セキュリティにおける多くの新しい応用を開放する。
論文参考訳（メタデータ） (2024-04-21T21:36:42Z)
A Unified Scheme of ResNet and Softmax [8.556540804058203]
回帰問題を理論的に解析する: $| langle exp(Ax) + A x, bf 1_n rangle-1 ( exp(Ax) + Ax ) この回帰問題は、ソフトマックス回帰とResNetを組み合わせた統一的なスキームである。
論文参考訳（メタデータ） (2023-09-23T21:41:01Z)
Convergence of Two-Layer Regression with Nonlinear Units [10.295897511849034]
近似ニュートン法に基づくグリーディアルゴリズムを導入し, 最適解までの距離の意味で収束する。リプシッツ条件を緩和し、損失値の意味で収束を証明する。
論文参考訳（メタデータ） (2023-08-16T13:30:45Z)
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression [7.578147116161996]
大規模言語モデル(LLM)は、様々な領域にまたがる多数の実環境アプリケーションを持つ。 LLMは自然言語処理(NLP)の分野に革命をもたらす可能性がある
論文参考訳（メタデータ） (2023-05-01T05:16:07Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
我々は,損失関数が約1,300ドル以上のエピソードに対して任意に変化するような,敵対的決定過程(MDP)の学習を検討する。本稿では,同じ設定で$tildemathcal O(K2/3)$に対する後悔を改善する2つのアルゴリズムを提案する。
論文参考訳（メタデータ） (2023-01-30T14:37:21Z)
Nonparametric regression with modified ReLU networks [77.34726150561087]
ネットワーク重み行列を入力ベクトルに乗じる前に,まず関数$alpha$で修正したReLUニューラルネットワークによる回帰推定を考察する。
論文参考訳（メタデータ） (2022-07-17T21:46:06Z)
ReLU Regression with Massart Noise [52.10842036932169]
本稿では、ReLU回帰の基本的問題として、Rectified Linear Units(ReLU)をデータに適合させることを目標としている。我々は自然およびよく研究された半ランダムノイズモデルであるMassartノイズモデルにおけるReLU回帰に着目した。このモデルにおいて,パラメータの正確な回復を実現する効率的なアルゴリズムを開発した。
論文参考訳（メタデータ） (2021-09-10T02:13:22Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
本稿では,ランダム化最小二乗値反復(RLSVI)アルゴリズムに着想を得たモデルレス強化学習アルゴリズムを提案する。提案アルゴリズムは,スカラーノイズを用いたトレーニングデータを簡易に摂動させることにより,探索を促進する。我々はこの理論を、既知の困難な探査課題にまたがる実証的な評価で補完する。
論文参考訳（メタデータ） (2021-06-15T02:23:07Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
人工知能(AI)と機械学習(ML)の分野では、未知のターゲット関数 $y=f(mathbfx)$ の近似が共通の目的である。トレーニングセットとして$S$を参照し、新しいインスタンス$mathbfx$に対して、このターゲット関数を効果的に近似できる低複雑さの数学的モデルを特定することを目的としている。
論文参考訳（メタデータ） (2020-11-27T04:57:40Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
学習者が生成モデル$Y = langle X,w* rangle + epsilon$から$n$のサンプルにアクセスできるような高次元頑健な線形回帰問題について検討する。 i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance, (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
論文参考訳（メタデータ） (2020-07-16T06:44:44Z)
Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
本稿では,データに対する凸関数(DC関数)の差を利用した線形回帰手法を提案する。実際に実装可能であることを示すとともに,実世界のデータセット上で既存の回帰/分類手法に匹敵する性能を有することを実証的に検証した。
論文参考訳（メタデータ） (2020-07-05T18:58:47Z)
Statistical-Query Lower Bounds via Functional Gradients [19.5924910463796]
我々は、寛容$n- (1/epsilon)b$の統計クエリアルゴリズムは、一定の$bに対して少なくとも$2nc epsilon$クエリを使用する必要があることを示す。実数値学習では珍しいSQ学習アルゴリズムが一般的である(相関学習とは対照的に)。
論文参考訳（メタデータ） (2020-06-29T05:15:32Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。