Fugu-MT 論文翻訳(概要): Echo State Transformer: When chaos brings memory

論文の概要: Echo State Transformer: When chaos brings memory

arxiv url: http://arxiv.org/abs/2507.02917v1
Date: Wed, 25 Jun 2025 09:56:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-13 12:05:57.520026
Title: Echo State Transformer: When chaos brings memory
Title（参考訳）: Echo State Transformer: カオスがメモリをもたらすとき
Authors: Yannis Bendi-Ouis, Xavier Hinaut,
Abstract要約: 本稿では,逐次データ処理のためのハイブリッドアーキテクチャであるEcho State Transformers (EST)を紹介する。 ESTはTransformerのアテンションメカニズムとReservoir Computingの原則を統合し、固定サイズのウィンドウ分散メモリシステムを作成する。 ESTは各処理ステップで一定の計算複雑性を達成し、標準変換器の2次スケーリング問題を効果的に破る。
参考スコア（独自算出の注目度）: 2.07180164747172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Large Language Models and their underlying Transformer architecture are remarkably efficient, they do not reflect how our brain processes and learns a diversity of cognitive tasks such as language and working memory. Furthermore, sequential data processing with Transformers encounters a fundamental barrier: quadratic complexity growth with sequence length. Motivated by these limitations, our ambition is to create more efficient models that are less reliant on intensive computations and massive volumes of data. We introduce Echo State Transformers (EST), a hybrid architecture that elegantly resolves this challenge while demonstrating exceptional performance in low-data regimes. EST integrates the Transformer attention mechanisms with principles from Reservoir Computing to create a fixedsize window distributed memory system. Drawing inspiration from Echo State Networks, the most prominent instance of the Reservoir Computing paradigm, our architecture integrates a new module called ''Working Memory'' based on several reservoirs (i.e. random recurrent networks) working in parallel. These reservoirs work as independent memory units with distinct internal dynamics. A novelty here is that the classical reservoir hyperparameters controlling the dynamics are now trained. Thus, the EST dynamically adapts the memory/non-linearity trade-off in reservoirs. By maintaining a fixed number of memory units regardless of sequence length, EST achieves constant computational complexity at each processing step, effectively breaking the quadratic scaling problem of standard Transformers. Evaluations on the STREAM benchmark, which comprises 12 diverse sequential processing tasks, demonstrate that EST outperforms GRUs, LSTMs, and even Transformers on 8 of these tasks. These findings highlight that Echo State Transformers can be an effective replacement to GRUs and LSTMs while complementing standard Transformers at least on resource-constrained environments and low-data scenarios across diverse sequential processing tasks.
Abstract（参考訳）: 大きな言語モデルとその基盤となるTransformerアーキテクチャは極めて効率的ですが、私たちの脳がどのように処理し、言語やワーキングメモリといった認知タスクの多様性を学ぶかを反映していません。さらに、Transformersによるシーケンシャルなデータ処理は、シーケンシャルな複雑性成長とシーケンシャルな長さという、基本的な障壁に直面している。これらの制限によって動機づけられた私たちの野望は、集中的な計算や大量のデータに依存しないより効率的なモデルを作ることです。我々は,この課題をエレガントに解決し,低データのレシエーションにおいて例外的な性能を示すハイブリッドアーキテクチャであるEcho State Transformers (EST)を紹介した。 ESTはTransformerのアテンションメカニズムとReservoir Computingの原則を統合し、固定サイズのウィンドウ分散メモリシステムを作成する。 Reservoir Computingパラダイムの最も顕著な例であるEcho State Networksからインスピレーションを得た私たちのアーキテクチャは、並列に動作する複数のリレーブ(ランダムリカレントネットワーク)に基づいて、'Working Memory'と呼ばれる新しいモジュールを統合する。これらの貯水池は、異なる内部ダイナミクスを持つ独立したメモリユニットとして機能する。ここでの新たな特徴は、力学を制御する古典的な貯水池のハイパーパラメータが現在訓練されていることである。したがって、ESTは貯水池のメモリ/非線形トレードオフを動的に適応させる。シーケンス長に関わらず一定数のメモリユニットを維持することにより、ESTは各処理ステップで一定の計算複雑性を達成し、標準変換器の二次スケーリング問題を効果的に破壊する。 12の異なるシーケンシャルな処理タスクからなるSTREAMベンチマークの評価では、ESTがGRU、LSTM、さらには8つのタスクのトランスフォーマーよりも優れていることが示されている。これらの結果から,Echo State Transformer は GRU や LSTM の代替として有効であると同時に,リソース制約のある環境やさまざまなシーケンシャルな処理タスクにおける低データシナリオにおいて,標準の Transformer を補完する可能性が示唆された。

論文の概要: Echo State Transformer: When chaos brings memory

関連論文リスト