Fugu-MT 論文翻訳(概要): Compact Recurrent Transformer with Persistent Memory

論文の概要: Compact Recurrent Transformer with Persistent Memory

arxiv url: http://arxiv.org/abs/2505.00929v1
Date: Fri, 02 May 2025 00:11:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-05 17:21:19.869582
Title: Compact Recurrent Transformer with Persistent Memory
Title（参考訳）: 持続記憶型小型リカレント変圧器
Authors: Edison Mucllari, Zachary Daniels, David Zhang, Qiang Ye,
Abstract要約: Transformerアーキテクチャは多くの言語処理と視覚タスクで大きな成功を収めている。高速なCRT(Compact Recurrent Transformer)を提案する。 CRTは、短いローカルセグメントを処理する浅層トランスフォーマーモデルとリカレントニューラルネットワークを組み合わせて、単一の永続メモリベクトルを圧縮および管理する。我々は,WordPTBとWikiText-103のCRTとToyota Smarthomeのビデオデータセットの分類を行った。
参考スコア（独自算出の注目度）: 16.48606806238812
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Transformer architecture has shown significant success in many language processing and visual tasks. However, the method faces challenges in efficiently scaling to long sequences because the self-attention computation is quadratic with respect to the input length. To overcome this limitation, several approaches scale to longer sequences by breaking long sequences into a series of segments, restricting self-attention to local dependencies between tokens within each segment and using a memory mechanism to manage information flow between segments. However, these approached generally introduce additional compute overhead that restricts them from being used for applications where limited compute memory and power are of great concern (such as edge computing). We propose a novel and efficient Compact Recurrent Transformer (CRT), which combines shallow Transformer models that process short local segments with recurrent neural networks to compress and manage a single persistent memory vector that summarizes long-range global information between segments. We evaluate CRT on WordPTB and WikiText-103 for next-token-prediction tasks, as well as on the Toyota Smarthome video dataset for classification. CRT achieves comparable or superior prediction results to full-length Transformers in the language datasets while using significantly shorter segments (half or quarter size) and substantially reduced FLOPs. Our approach also demonstrates state-of-the-art performance on the Toyota Smarthome video dataset.
Abstract（参考訳）: Transformerアーキテクチャは多くの言語処理と視覚タスクで大きな成功を収めている。しかし,本手法は,入力長に対して,自己注意計算が二次的であるため,効率よく長い列にスケールする際の課題に直面する。この制限を克服するために、いくつかのアプローチは、長いシーケンスを一連のセグメントに分割し、各セグメント内のトークン間のローカル依存関係への自己アテンションを制限し、セグメント間の情報フローを管理するメモリメカニズムを使用して、より長いシーケンスにスケールする。しかしながら、これらのアプローチは一般的に、コンピューティングメモリと電力が(エッジコンピューティングのような)大きな関心を持つアプリケーションでの使用を制限する、追加の計算オーバーヘッドを導入している。本稿では,短い局所セグメントを処理する浅層変圧器モデルとリカレントニューラルネットワークを併用して,セグメント間の長距離グローバル情報を要約した1つの永続メモリベクトルを圧縮・管理する,新しい,効率的なコンパクト・リカレント・トランスフォーマを提案する。我々は,WordPTBとWikiText-103のCRTとToyota Smarthomeのビデオデータセットの分類を行った。 CRTは、言語データセットのフル長トランスフォーマーに匹敵する、あるいは優れた予測結果を達成し、より短いセグメント(半分または4分の1サイズ)と大幅に削減されたFLOPを使用する。提案手法では,Toyota Smarthomeビデオデータセット上での最先端のパフォーマンスも示す。

論文の概要: Compact Recurrent Transformer with Persistent Memory

関連論文リスト