Fugu-MT 論文翻訳(概要): Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability

論文の概要: Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability

arxiv url: http://arxiv.org/abs/2510.26792v1
Date: Thu, 30 Oct 2025 17:59:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.971883
Title: Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
Title（参考訳）: 変圧器を用いた擬似乱数学習:変圧器・変圧器・カリキュラム・解釈可能性
Authors: Tao Tao, Maissam Barkeshli,
Abstract要約: 変圧器モデルによる変圧器発電機(PCG)のシーケンス学習能力について検討する。 PCGは、一連のビットワイズシフト、XOR、回転、切り離しを隠された状態に適用することで、線形合同発生器(LCG)に対してかなりの困難をもたらす。いずれにせよ,トランスフォーマーは多様なPCG変種からの未知のシーケンスに対して,コンテクスト内での予測を成功させることができることを示す。
参考スコア（独自算出の注目度）: 10.75037955193936
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs), a widely used family of pseudo-random number generators (PRNGs). PCGs introduce substantial additional difficulty over linear congruential generators (LCGs) by applying a series of bit-wise shifts, XORs, rotations and truncations to the hidden state. We show that Transformers can nevertheless successfully perform in-context prediction on unseen sequences from diverse PCG variants, in tasks that are beyond published classical attacks. In our experiments we scale moduli up to $2^{22}$ using up to $50$ million model parameters and datasets with up to $5$ billion tokens. Surprisingly, we find even when the output is truncated to a single bit, it can be reliably predicted by the model. When multiple distinct PRNGs are presented together during training, the model can jointly learn them, identifying structures from different permutations. We demonstrate a scaling law with modulus $m$: the number of in-context sequence elements required for near-perfect prediction grows as $\sqrt{m}$. For larger moduli, optimization enters extended stagnation phases; in our experiments, learning moduli $m \geq 2^{20}$ requires incorporating training data from smaller moduli, demonstrating a critical necessity for curriculum learning. Finally, we analyze embedding layers and uncover a novel clustering phenomenon: the model spontaneously groups the integer inputs into bitwise rotationally-invariant clusters, revealing how representations can transfer from smaller to larger moduli.
Abstract（参考訳）: 疑似乱数生成器(PRNG)のファミリーであるPermuted Congruential Generators (PCGs) が生成したシーケンスをトランスフォーマーモデルで学習する能力について検討した。 PCGは、一連のビットワイズシフト、XOR、回転、切り離しを隠された状態に適用することで、線形合同発生器(LCG)に対してかなりの困難をもたらす。いずれにせよ,トランスフォーマーは,古典的攻撃を超越したタスクにおいて,多様なPCG変種からの未知のシーケンスに対して,コンテクスト内での予測を成功させることができることを示す。実験では、最大5億ドルのトークンを持つモデルパラメータとデータセットを使用して、モジュライを最大$2^{22}$までスケールします。驚くべきことに、出力が1ビットに切り替わっている場合でも、モデルによって確実に予測できる。トレーニング中に複数の異なるPRNGが一緒に提示されると、モデルはそれらを共同で学習し、異なる置換から構造を識別する。我々は modulus $m$: ほぼ完全な予測に必要なコンテキスト内シーケンス要素の数は $\sqrt{m}$ として増加することを示した。我々の実験では、より小さなモジュラーからのトレーニングデータを組み込むことが必要であり、カリキュラム学習にとって重要な必要性を示す。最後に、埋め込み層を解析し、新しいクラスタリング現象を明らかにする: モデルは、整数入力をビット単位の回転不変なクラスタに自発的にグループ化し、表現がより小さなモジュールからより大きなモジュールへどのように移行できるかを明らかにする。

論文の概要: Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability

関連論文リスト