Fugu-MT 論文翻訳(概要): Transformers perform adaptive partial pooling

論文の概要: Transformers perform adaptive partial pooling

arxiv url: http://arxiv.org/abs/2602.03980v1
Date: Tue, 03 Feb 2026 20:05:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-05 19:45:11.259034
Title: Transformers perform adaptive partial pooling
Title（参考訳）: トランスフォーマーは適応的な部分プールを実行します
Authors: Vsevolod Kapatsinski,
Abstract要約: 階層的回帰では、ある文脈における行動に対するモデルの予測は、他の類似した文脈からの観測に影響される。これはアダプティブ部分プールと呼ばれる。本稿では,変圧器(GPT2)の次単語予測が,現在の文脈外からの観測の影響を受けない傾向にあることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Because language is creative, any reasonable language model must generalize, deciding what to say in novel contexts by using information from similar contexts. But what about contexts that are not novel but merely infrequent? In hierarchical regression, the model's predictions for behavior in a context are affected by observations from other similar contexts to the extent that 1) the current context is infrequent and 2) different contexts behave similarly. This is called adaptive partial pooling of evidence. This paper shows that next-word predictions of a transformer (GPT2) are increasingly unaffected by observations from outside the current context across epochs of training (the amount of pooling reduces with training), and that the extent of pooling is affected by context frequency, context number (type frequency) and context variability in a similar way to hierarchical regression. These characteristics of learning in transformers are argued to be realistic on both rational and empirical grounds.
Abstract（参考訳）: 言語は創造的であるため、任意の合理的言語モデルは、類似した文脈からの情報を使用することで、新しい文脈で何を言おうかを決定する必要がある。しかし、新しいものではなく、単に稀な文脈はどうだろうか? 階層的回帰では、ある文脈における行動に対するモデルの予測は、他の類似した文脈からの観察によって影響を受ける。 1)現在の文脈はまれで、 2) 異なる文脈でも同様に振る舞う。これはアダプティブ部分プールと呼ばれる。本稿では, 変圧器(GPT2)の次単語予測は, 学習のエポックスにおける現在の文脈外からの観測(学習に伴うプール量の減少)の影響が増大しており, プールの程度は, 階層的回帰と同様の文脈周波数, 文脈数(タイプ周波数), 文脈変動の影響を受けていることを示す。変圧器におけるこれらの学習の特徴は、有理と経験の両方の観点から現実的であると論じられている。

論文の概要: Transformers perform adaptive partial pooling

関連論文リスト