Fugu-MT 論文翻訳(概要): Memory Augmented Optimizers for Deep Learning

論文の概要: Memory Augmented Optimizers for Deep Learning

arxiv url: http://arxiv.org/abs/2106.10708v1
Date: Sun, 20 Jun 2021 14:58:08 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-23 09:42:59.142061
Title: Memory Augmented Optimizers for Deep Learning
Title（参考訳）: ディープラーニングのためのメモリ拡張最適化
Authors: Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath Chandar
Abstract要約: 本稿では,メモリ内の勾配履歴を限定的に把握する,メモリ拡張勾配降下の枠組みを提案する。固定サイズのメモリを持つ勾配勾配勾配のクラスは、強い凸性の仮定の下で収束することを示す。
参考スコア（独自算出の注目度）: 10.541705775336657
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or explicitly stored in memory has been shown effective in theory and practice, the question of whether $all$ or only a subset of the gradients in the history are sufficient in deciding the parameter updates remains unanswered. In this paper, we propose a framework of memory-augmented gradient descent optimizers that retain a limited view of their gradient history in their internal memory. Such optimizers scale well to large real-life datasets, and our experiments show that the memory augmented extensions of standard optimizers enjoy accelerated convergence and improved performance on a majority of computer vision and language tasks that we considered. Additionally, we prove that the proposed class of optimizers with fixed-size memory converge under assumptions of strong convexity, regardless of which gradients are selected or how they are linearly combined to form the update step.
Abstract（参考訳）: データ駆動学習における損失を最小化するための一般的なアプローチは、しばしば効率的なパラメータ更新のために勾配の歴史を抽象化または明示的に保持する。勾配の集約された履歴は、任意のステップの勾配が情報的でない場合でも、パラメータを正しい方向に更新する。メタパラメータにまとめられたり、メモリに明示的に格納された勾配の歴史は理論と実践において有効であることが示されているが、パラメータ更新を決定するのに$all$または一部の勾配しか不十分かどうかという問題は未解決のままである。本稿では,内部メモリにおける勾配履歴の限られたビューを保持するメモリ拡張型勾配降下最適化器の枠組みを提案する。このようなオプティマイザは、大規模なリアルタイムデータセットによく拡張でき、標準オプティマイザのメモリ拡張拡張は、私たちが検討したコンピュータビジョンや言語タスクの大部分において、収束の加速とパフォーマンスの向上を享受できることを示した。さらに,提案する固定サイズのメモリを持つオプティマイザのクラスは,どの勾配が選択されるか,どのように線形に結合して更新ステップを形成するかに関わらず,強い凸性の仮定のもとに収束することを示す。

関連論文リスト

Breaking Memory Limits: Gradient Wavelet Transform Enhances LLMs Training [45.225732322141994]
大規模言語モデル(LLM)は、さまざまな自然言語処理タスクで優れたパフォーマンスを発揮する。彼らの膨大な数のパラメータは、トレーニング中に大きな記憶障害を引き起こします。既存のメモリ効率のアルゴリズムは、特異値分解プロジェクションや重み凍結のような技術に依存していることが多い。本稿では,グラディエントウェーブレット変換(GWT)と呼ばれる新しい解を提案する。
論文参考訳（メタデータ） (2025-01-13T11:35:09Z)
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
現在のディープラーニングメモリモデルは、部分的に観察可能で長期にわたる強化学習環境で苦労している。本稿では,強化学習エージェントのための新しい記憶モデルであるStable Hadamard Memoryを紹介する。我々の手法は、部分的に観測可能なベンチマークに挑戦する上で、最先端のメモリベースの手法よりも大幅に優れています。
論文参考訳（メタデータ） (2024-10-14T03:50:17Z)
An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
継続的学習(CL)は機械学習の基本的なトピックであり、目標は連続的なデータとタスクでモデルをトレーニングすることだ。メモリ制限のため、すべての履歴データを保存できないため、破滅的な忘れの問題に直面します。モデルの各更新ステップの勾配をキャリブレーションする有効なアルゴリズムを開発した。
論文参考訳（メタデータ） (2024-07-30T16:30:09Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
本稿では,リソース制限シナリオに対するSHERLと呼ばれる革新的なMETL戦略を提案する。初期経路では、中間出力は反冗長動作によって統合される。遅延ルートでは、最小限の遅延事前トレーニングされたレイヤを利用することで、メモリオーバーヘッドのピーク需要を軽減できる。
論文参考訳（メタデータ） (2024-07-10T10:22:35Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
大規模言語モデルに対する適応学習率(AdaLomo)を用いた低メモリ最適化を提案する。 AdaLomoはAdamWと同等の結果を得ると同時に、メモリ要件を大幅に削減し、大きな言語モデルをトレーニングするためのハードウェア障壁を低くする。
論文参考訳（メタデータ） (2023-10-16T09:04:28Z)
EMO: Episodic Memory Optimization for Few-Shot Meta-Learning [69.50380510879697]
メタ学習のためのエピソード記憶最適化は、EMOと呼ばれ、脳の記憶から過去の学習経験を思い出す人間の能力にインスパイアされている。 EMOは、限られた数の例によって提供される勾配が非形式的である場合でも、パラメータを正しい方向に更新する。 EMOは、ほとんど数ショットの分類ベンチマークでうまくスケールし、最適化ベースのメタラーニング手法の性能を改善している。
論文参考訳（メタデータ） (2023-06-08T13:39:08Z)
Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
TomはAdamの新しい変種であり、ニューラルネットワークによって渡される損失の風景の勾配の傾向を考慮に入れている。 Tomは両方の精度でAdagrad、Adadelta、RMSProp、Adamを上回り、より早く収束する。
論文参考訳（メタデータ） (2021-09-07T20:19:40Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。 textscAdaMomentum on vision, and achieves state-the-art results on other task including language processing。
論文参考訳（メタデータ） (2021-06-22T03:13:23Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
サンプルごとのHessian-vector積と勾配を用いて、自己チューニングの二次構造を構築する。モデルに基づく手続きが雑音勾配設定に収束することを証明する。これは自己チューニング二次体を構築するための興味深いステップである。
論文参考訳（メタデータ） (2020-11-09T22:07:30Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。