Fugu-MT 論文翻訳(概要): Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

論文の概要: Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

arxiv url: http://arxiv.org/abs/2603.07122v1
Date: Sat, 07 Mar 2026 09:15:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.892034
Title: Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers
Title（参考訳）: ディープラーニング最適化の一般化を促進するためのAdamと逆カウンタの組み合わせ
Authors: Tao Shi, Liangming Chen, Long Jin, Mengchu Zhou,
Abstract要約: ニューラルネットワークのトレーニングでは、適応モーメント推定(Adam)は通常、高速に収束するが、最適以下の一般化性能を示す。平らなミニマを見つける能力を高めるため、逆アダム(InvAdam)という新しい変種を提案する。 InvAdamは1階と2階のモーメントの要素ワイド乗算を計算し、Adamは2つのモーメントの要素ワイド除算を計算する。
参考スコア（独自算出の注目度）: 57.049014152026864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the training of neural networks, adaptive moment estimation (Adam) typically converges fast but exhibits suboptimal generalization performance. A widely accepted explanation for its defect in generalization is that it often tends to converge to sharp minima. To enhance its ability to find flat minima, we propose its new variant named inverse Adam (InvAdam). The key improvement of InvAdam lies in its parameter update mechanism, which is opposite to that of Adam. Specifically, it computes element-wise multiplication of the first-order and second-order moments, while Adam computes the element-wise division of these two moments. This modification aims to increase the step size of the parameter update when the elements in the second-order moments are large and vice versa, which helps the parameter escape sharp minima and stay at flat ones. However, InvAdam's update mechanism may face challenges in convergence. To address this challenge, we propose dual Adam (DualAdam), which integrates the update mechanisms of both Adam and InvAdam, ensuring convergence while enhancing generalization performance. Additionally, we introduce the diffusion theory to mathematically demonstrate InvAdam's ability to escape sharp minima. Extensive experiments are conducted on image classification tasks and large language model (LLM) fine-tuning. The results validate that DualAdam outperforms Adam and its state-of-the-art variants in terms of generalization performance. The code is publicly available at https://github.com/LongJin-lab/DualAdam.
Abstract（参考訳）: ニューラルネットワークのトレーニングでは、適応モーメント推定(Adam)は通常、高速に収束するが、最適以下の一般化性能を示す。一般化の欠陥について広く受け入れられている説明は、しばしば鋭いミニマに収束する傾向があるということである。平らなミニマを見つける能力を高めるため、逆アダム(InvAdam)と呼ばれる新しい変種を提案する。 InvAdamの主な改善点はパラメータ更新メカニズムにある。具体的には、第1次モーメントと第2次モーメントの要素ワイド乗算を計算し、アダムはこの2つのモーメントの要素ワイド除算を計算する。この修正は、第2次モーメントの要素が大きければ、パラメータ更新のステップサイズを増やすことを目的としており、その逆もまた、パラメータがシャープなミニマから逃れ、フラットなモーメントに留まるのに役立つ。しかし、InvAdamの更新メカニズムは収束の課題に直面している可能性がある。この課題に対処するため,Adam と InvAdam の両更新機構を統合し,一般化性能を高めつつ収束性を確保する2つのAdam (DualAdam) を提案する。さらに,InvAdamの急激なミニマを逃れる能力を数学的に証明するために拡散理論を導入する。画像分類タスクと大規模言語モデル(LLM)のファインチューニングについて大規模な実験を行った。結果は、DualAdamがAdamとその最先端の変種を一般化性能で上回っていることを検証した。コードはhttps://github.com/LongJin-lab/DualAdam.comで公開されている。

論文の概要: Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

関連論文リスト