Fugu-MT 論文翻訳(概要): Learning Large-Scale Modular Addition with an Auxiliary Modulus

論文の概要: Learning Large-Scale Modular Addition with an Auxiliary Modulus

arxiv url: http://arxiv.org/abs/2605.07648v1
Date: Fri, 08 May 2026 12:16:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:39.034231
Title: Learning Large-Scale Modular Addition with an Auxiliary Modulus
Title（参考訳）: 補助モジュールによる大規模モジュール付加の学習
Authors: Hanato Kikuchi, Ryosuke Masuya, Kazuhiko Kawamoto, Hiroshi Kera,
Abstract要約: 最近の研究では、和数とモジュラリティの両方において、モジュール加算学習を実質的にスケールさせた。本研究は, この側面効果を理論的, 実験的に解析し, モジュラ付加のための共シフトフリー法を提案する。
参考スコア（独自算出の注目度）: 11.864560633772678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to increase zeros in training sequences, reducing the effective number of summands and thus controlling training difficulty; however, this induces covariate shift between training and test input distributions. This study theoretically and empirically analyzes this side effect and proposes a covariate-shift-free method for modular addition. Specifically, we introduce an auxiliary modulus $Kq$ during training, which reduces wrap-around frequency and problem difficulty while preserving the same input distribution across training and testing. Experiments show strong scalability and sample efficiency: even for large input length $N$, large modulus $q$, and small datasets -- where the sparse method fails to learn -- our method achieves equal or better match accuracy and relaxed $τ$-accuracy. For example, at $N=64$ and $q=974269$, our method trained on 100K samples achieves $97.0\%$ $τ$-accuracy at $τ=0.05$, while the sparse method achieves only $9.5\%$ with the same data size and $93.9\%$ even when extended to 1M samples.
Abstract（参考訳）: より一般的なモジュール追加である学習パリティ関数は、入力感度のために難しい機械学習タスクである。最近の研究では、和数とモジュラリティの両方において、モジュール加算学習を実質的にスケールさせた。その鍵となる考え方は、トレーニングシーケンスのゼロを増大させ、有効数のサマンドを減らし、トレーニングの難しさを制御することであるが、これはトレーニングとテスト入力の分布の共変を誘導する。本研究は, この側面効果を理論的, 実験的に解析し, モジュラ付加に対する共変量シフトフリー法を提案する。具体的には、トレーニングとテストで同じ入力分布を保ちながら、ラップアラウンド頻度と問題の難易度を低減できる補助率$Kq$を導入する。実験では、大きな入力長$N$、大きな modulus $q$、小さなデータセット -- スパース法が学習に失敗している -- に対して、我々の手法は同等あるいはより良いマッチング精度を達成し、τ$-精度を緩和する。例えば、$N=64$と$q=974269$では、100Kサンプルでトレーニングされたメソッドは、$τ=0.05$で$τ$-accuracyが9.7.0%、同じデータサイズで9.5\%が9.9\%が1Mサンプルに拡張されても9.9\%となる。

論文の概要: Learning Large-Scale Modular Addition with an Auxiliary Modulus

関連論文リスト