Fugu-MT 論文翻訳(概要): Characterizing Mamba's Selective Memory using Auto-Encoders

論文の概要: Characterizing Mamba's Selective Memory using Auto-Encoders

arxiv url: http://arxiv.org/abs/2512.15653v1
Date: Wed, 17 Dec 2025 18:05:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-18 17:06:27.092218
Title: Characterizing Mamba's Selective Memory using Auto-Encoders
Title（参考訳）: オートエンコーダを用いたマンバ選択メモリのキャラクタリゼーション
Authors: Tamanna Hossain, Robert L. Logan, Ganesh Jagadeesan, Sameer Singh, Joel Tetreault, Alejandro Jaimes,
Abstract要約: 状態空間モデル(SSM)は、推論中に固定メモリを使用するため、言語モデリングのためのトランスフォーマーに代わる有望な代替品である。これまでの研究では、この情報損失が発生するシーケンス長について研究されてきたが、SSM言語モデル(LM)が忘れがちな情報のタイプを特徴付けていない。我々は、SSM LMによってよく忘れられるトークンの種類(たとえば、音声の一部、名前付きエンティティ)とシーケンス(例えば、コード、数学の問題)を識別する。
参考スコア（独自算出の注目度）: 49.83619099242128
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State space models (SSMs) are a promising alternative to transformers for language modeling because they use fixed memory during inference. However, this fixed memory usage requires some information loss in the hidden state when processing long sequences. While prior work has studied the sequence length at which this information loss occurs, it does not characterize the types of information SSM language models (LMs) tend to forget. In this paper, we address this knowledge gap by identifying the types of tokens (e.g., parts of speech, named entities) and sequences (e.g., code, math problems) that are more frequently forgotten by SSM LMs. We achieve this by training an auto-encoder to reconstruct sequences from the SSM's hidden state, and measure information loss by comparing inputs with their reconstructions. We perform experiments using the Mamba family of SSM LMs (130M--1.4B) on sequences ranging from 4--256 tokens. Our results show significantly higher rates of information loss on math-related tokens (e.g., numbers, variables), mentions of organization entities, and alternative dialects to Standard American English. We then examine the frequency that these tokens appear in Mamba's pretraining data and find that less prevalent tokens tend to be the ones Mamba is most likely to forget. By identifying these patterns, our work provides clear direction for future research to develop methods that better control Mamba's ability to retain important information.
Abstract（参考訳）: 状態空間モデル(SSM)は、推論中に固定メモリを使用するため、言語モデリングのためのトランスフォーマーに代わる有望な代替品である。しかし、この固定メモリの使用には、長いシーケンスを処理する際に隠された状態に何らかの情報を失う必要がある。これまでの研究では、この情報損失が発生するシーケンス長について研究されてきたが、SSM言語モデル(LM)が忘れがちな情報のタイプを特徴付けていない。本稿では,SSM LMでよく忘れられるトークンの種類(例:音声の一部,名前付きエンティティ)とシーケンス(例:コード,数学問題)を識別することで,この知識ギャップに対処する。我々は、自動エンコーダを訓練して、SSMの隠された状態からシーケンスを再構築し、入力とそれらの再構成を比較して情報損失を測定する。我々は,SSM LM(130M--1.4B)のMambaファミリーを用いて,4-256トークンのシーケンスについて実験を行った。その結果,数学関連トークン(例えば,数,変数),組織実体の言及,標準アメリカ英語の代替方言について,情報損失率を著しく高めた。次に、これらのトークンがマムバの事前学習データに現れる頻度を調べ、マムバが忘れる確率の低いトークンがマムバの忘れる確率の低いトークンであることを示す。これらのパターンを同定することにより、今後の研究がマンバの重要情報の保持能力をよりよく制御する手法を開発するための明確な方向性を提供する。

論文の概要: Characterizing Mamba's Selective Memory using Auto-Encoders

関連論文リスト