Fugu-MT 論文翻訳(概要): ATLAS: Learning to Optimally Memorize the Context at Test Time

論文の概要: ATLAS: Learning to Optimally Memorize the Context at Test Time

arxiv url: http://arxiv.org/abs/2505.23735v1
Date: Thu, 29 May 2025 17:57:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-30 18:14:08.067936
Title: ATLAS: Learning to Optimally Memorize the Context at Test Time
Title（参考訳）: ATLAS: テスト時にコンテキストを最適に覚えることを学ぶ
Authors: Ali Behrouz, Zeman Li, Praneeth Kacham, Majid Daliri, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni,
Abstract要約: ATLASは、コンテキストを記憶する能力の高い長期記憶モジュールである。本稿では,従来のトランスフォーマーアーキテクチャの厳密な一般化であるDeep Transformerと呼ばれる,トランスフォーマーライクなアーキテクチャの新たなファミリーを紹介する。
参考スコア（独自算出の注目度）: 31.41718170413687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have been established as the most popular backbones in sequence modeling, mainly due to their effectiveness in in-context retrieval tasks and the ability to learn at scale. Their quadratic memory and time complexity, however, bound their applicability in longer sequences and so has motivated researchers to explore effective alternative architectures such as modern recurrent neural networks (a.k.a long-term recurrent memory module). Despite their recent success in diverse downstream tasks, they struggle in tasks that requires long context understanding and extrapolation to longer sequences. We observe that these shortcomings come from three disjoint aspects in their design: (1) limited memory capacity that is bounded by the architecture of memory and feature mapping of the input; (2) online nature of update, i.e., optimizing the memory only with respect to the last input; and (3) less expressive management of their fixed-size memory. To enhance all these three aspects, we present ATLAS, a long-term memory module with high capacity that learns to memorize the context by optimizing the memory based on the current and past tokens, overcoming the online nature of long-term memory models. Building on this insight, we present a new family of Transformer-like architectures, called DeepTransformers, that are strict generalizations of the original Transformer architecture. Our experimental results on language modeling, common-sense reasoning, recall-intensive, and long-context understanding tasks show that ATLAS surpasses the performance of Transformers and recent linear recurrent models. ATLAS further improves the long context performance of Titans, achieving +80\% accuracy in 10M context length of BABILong benchmark.
Abstract（参考訳）: トランスフォーマーはシーケンスモデリングにおいて最も人気のあるバックボーンとして確立されてきた。しかし、それらの二次記憶と時間の複雑さは、より長いシーケンスで適用可能性に縛られ、現代のリカレントニューラルネットワーク(例えば長期リカレントメモリモジュール)のような効果的な代替アーキテクチャを探究する動機となった。下流の様々なタスクで最近成功したにもかかわらず、長いコンテキスト理解と長いシーケンスへの外挿を必要とするタスクで苦労している。これらの欠点は,(1) メモリのアーキテクチャと入力の特徴マッピングによって拘束されるメモリ容量の制限,(2) 更新のオンラインの性質,すなわち,最後の入力に対してのみメモリを最適化すること,(3) 固定サイズのメモリの表現力の少ない3つの側面から生じる。これら3つの側面を全て強化するため,ATLASは高容量の長期記憶モジュールで,現在および過去のトークンに基づいてメモリを最適化し,長期記憶モデルのオンライン的性質を克服することで,コンテキストを記憶することを学ぶ。この知見に基づいて、我々は、オリジナルのTransformerアーキテクチャの厳密な一般化であるDeep Transformerと呼ばれる、Transformerライクなアーキテクチャの新しいファミリーを提示する。言語モデリング,常識推論,リコール集約,長文理解タスクに関する実験結果から,ATLASがトランスフォーマーの性能や最近の線形リカレントモデルを上回ることを示す。 ATLASはさらにTitansの長いコンテキスト性能を改善し、ABILongベンチマークの10Mコンテキスト長で+80\%の精度を実現した。

論文の概要: ATLAS: Learning to Optimally Memorize the Context at Test Time

関連論文リスト