Fugu-MT 論文翻訳(概要): Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles

論文の概要: Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles

arxiv url: http://arxiv.org/abs/2505.19527v2
Date: Sun, 12 Oct 2025 12:30:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 15:48:08.650953
Title: Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles
Title（参考訳）: 転がりボール最適化:損失景観のしわを和らげることによる学習
Authors: Mohammed D. Belgoumri, Mohamed Reda Bouadjenek, Hakim Hacid, Imran Razzak, Sunil Aryal,
Abstract要約: 大規模ニューラルネットワーク(NN)のトレーニングには,高次元データ依存損失関数の最適化が必要である。これらの関数は、しばしば非常に複雑で、テクスチャがあり、フラクタル的ですらある。トレーニングデータのノイズは前方に伝播し、非表現的な小さな幾何学をもたらす。
参考スコア（独自算出の注目度）: 19.667068548957143
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training large neural networks (NNs) requires optimizing high-dimensional data-dependent loss functions. The optimization landscape of these functions is often highly complex and textured, even fractal-like, with many spurious local minima, ill-conditioned valleys, degenerate points, and saddle points. Complicating things further is the fact that these landscape characteristics are a function of the data, meaning that noise in the training data can propagate forward and give rise to unrepresentative small-scale geometry. This poses a difficulty for gradient-based optimization methods, which rely on local geometry to compute updates and are, therefore, vulnerable to being derailed by noisy data. In practice,this translates to a strong dependence of the optimization dynamics on the noise in the data, i.e., poor generalization performance. To remediate this problem, we propose a new optimization procedure: Rolling Ball Optimizer (RBO), that breaks this spatial locality by incorporating information from a larger region of the loss landscape in its updates. We achieve this by simulating the motion of a rigid sphere of finite radius rolling on the loss landscape, a straightforward generalization of Gradient Descent (GD) that simplifies into it in the infinitesimal limit. The radius serves as a hyperparameter that determines the scale at which RBO sees the loss landscape, allowing control over the granularity of its interaction therewith. We are motivated by the intuition that the large-scale geometry of the loss landscape is less data-specific than its fine-grained structure, and that it is easier to optimize. We support this intuition by proving that our algorithm has a smoothing effect on the loss function. Evaluation against SGD, SAM, and Entropy-SGD, on MNIST and CIFAR-10/100 demonstrates promising results in terms of convergence speed, training accuracy, and generalization performance.
Abstract（参考訳）: 大規模ニューラルネットワーク(NN)のトレーニングには,高次元データ依存損失関数の最適化が必要である。これらの関数の最適化のランドスケープは、しばしば非常に複雑で、フラクタルのようなテクスチャがあり、多くの刺激的な局所的なミニマ、不条件の谷、退化点、サドル点がある。さらに複雑なことは、これらのランドスケープ特性がデータの関数であるという事実であり、トレーニングデータのノイズが前方に伝播し、非表現的な小さな幾何学をもたらすことを意味する。これは、更新を計算するために局所的幾何に依存する勾配に基づく最適化手法では困難であり、ノイズの多いデータによって脱線される危険性がある。実際、これはデータのノイズ、すなわち一般化性能に最適化力学が強く依存していることを意味する。そこで本研究では,その更新にロスランドスケープの広い領域からの情報を取り入れることで,この空間的局所性を損なう新しい最適化手法を提案する。ロスランドスケープ上を転がる有限半径の剛球の運動をシミュレートし、無限小極限においてその運動を単純化するグラディエント・ディクセント(GD)の直接的な一般化を実現する。半径は、RBOがロスランドスケープを見るスケールを決定するハイパーパラメータとして機能し、その相互作用の粒度を制御できる。我々は、損失ランドスケープの大規模幾何学は、その微細な構造よりもデータ固有性が低く、最適化が容易であるという直感に動機付けられている。この直感は、アルゴリズムが損失関数に滑らかな効果があることを証明することで支持する。 MNISTおよびCIFAR-10/100におけるSGD,SAM,Entropy-SGDに対する評価は、収束速度、トレーニング精度、一般化性能の点で有望な結果を示す。

関連論文リスト

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks [59.552873049024775]
計算最適化モデルでは, 極めて高精度な普遍性を示すことを示す。学習速度が減衰すると、崩壊は非常に厳しくなり、モデル間の正規化曲線の差はノイズフロアより下になる。これらの現象は、典型的なニューラルスケーリング法則において、崩壊とパワー・ロー構造を結びつけることによって説明される。
論文参考訳（メタデータ） (2025-07-02T20:03:34Z)
Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities [14.741581246137404]
本研究では,大きな学習率によって引き起こされる不安定さが,損失景観の平坦な領域へモデルパラメータを移動させることを示す。最新のベンチマークデータセットでは,これらが優れた一般化性能をもたらすことが判明した。
論文参考訳（メタデータ） (2024-12-23T14:32:53Z)
Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
クロスエントロピーと平均二乗誤差を動的損失関数に変換する方法を示す。異なるサイズのネットワークに対する検証精度を大幅に向上させる方法を示す。
論文参考訳（メタデータ） (2024-10-14T16:27:03Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
我々は、閉形式力学を解析するための数学的機会を提供する、簡潔な損失関数であるアンヒンジド・ロスを導入する。アンヒンジされた損失は、時間変化学習率や特徴正規化など、より実践的なテクニックを検討することができる。
論文参考訳（メタデータ） (2023-12-13T02:11:07Z)
Gradient constrained sharpness-aware prompt learning for vision-language models [99.74832984957025]
本稿では,視覚言語モデル(VLM)の一般化可能な即時学習における新たなトレードオフ問題を提案する。最先端手法のロスランドスケープとSAMに基づくバニラシャープネス認識最小化法を解析することにより、トレードオフ性能は損失値と損失シャープネスの両方に相関していると結論付けた。本稿では,GCSCoOp (Gradient Constrained Sharpness-Aware Context Optimization) と表記される,素早い学習のためのSAMベースの新しい手法を提案する。
論文参考訳（メタデータ） (2023-09-14T17:13:54Z)
Stabilizing Transformer Training by Preventing Attention Entropy Collapse [56.45313891694746]
本研究は,トランスフォーマーのトレーニングダイナミクスについて,注目層の進化について検討する。我々は、$sigma$Reparamが注意層におけるエントロピー崩壊を防ぎ、より安定したトレーニングを促進することを示す。画像分類、画像自己教師型学習、機械翻訳、音声認識、言語モデリングタスクについて、$sigma$Reparamで実験を行った。
論文参考訳（メタデータ） (2023-03-11T03:30:47Z)
Understanding and Combating Robust Overfitting via Input Loss Landscape Analysis and Regularization [5.1024659285813785]
アドリアリトレーニングは過度に適合する傾向があり、原因は明らかになっていない。標準的なトレーニング,特にクリーンロスの最小化による,堅牢なオーバーフィッティング結果が得られます。対向方向に沿った重み付きロジット変動をペナル化することにより、損失景観の平滑化を図るための新しい正規化器を提案する。
論文参考訳（メタデータ） (2022-12-09T16:55:30Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Tilting the playing field: Dynamical loss functions for machine learning [18.831125493827766]
学習中に周期的に進化する損失関数を用いて1つのクラスを同時に強調することにより、学習を改善することができることを示す。改善は、損失を最小限に抑えるために進化するシステムのダイナミクスと、変化する損失景観の相互作用から生じる。
論文参考訳（メタデータ） (2021-02-07T13:15:08Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
我々は、バリエーションのホストが、我々が提案する統一されたフレームワークでカバー可能であることを示す。本稿では,この手法の収束性を証明し,ResNet,LSTM,Transformer上での経験的性能を厳格に評価する。
論文参考訳（メタデータ） (2020-06-10T08:22:41Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
この軌道上の「破滅的な」点の存在を論じる。トレーニングの初期段階での大きな学習率を用いることで、勾配のばらつきが軽減されることを示す。また, バッチ正規化層を有するニューラルネットワークにおいても, 低学習率を用いることで損失面の条件が悪くなることを示す。
論文参考訳（メタデータ） (2020-02-21T22:55:51Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。