Fugu-MT 論文翻訳(概要): HRM-Text: Efficient Pretraining Beyond Scaling

論文の概要: HRM-Text: Efficient Pretraining Beyond Scaling

arxiv url: http://arxiv.org/abs/2605.20613v1
Date: Wed, 20 May 2026 01:59:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.432831
Title: HRM-Text: Efficient Pretraining Beyond Scaling
Title（参考訳）: HRM-Text: スケーリングを超えて効率的な事前トレーニング
Authors: Guan Wang, Changling Liu, Chenyu Wang, Cai Zhou, Yuhao Sun, Yifei Wu, Shuai Zhen, Luca Scimeca, Yasin Abbasi Yadkori,
Abstract要約: 大規模言語モデルの現在の事前訓練は、大規模な計算とインターネット規模の原文に依存している。我々は、HRM-Textを導入し、ゆっくりと進化する戦略的かつ急速に進化する実行層に分解する。我々は,タスク補完目標とPrefixLMマスキングを用いて,命令応答ペアのみを訓練する。
参考スコア（独自算出の注目度）: 26.800179678712976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The current pretraining paradigm for large language models relies on massive compute and internet-scale raw text, creating a significant barrier to foundational research. In contrast, biological systems demonstrate highly sample-efficient learning through multi-timescale processing, such as the functional organization of the frontoparietal loop. Taking this as inspiration, we introduce HRM-Text, which replaces standard Transformers with a Hierarchical Recurrent Model (HRM) that decouples computation into slow-evolving strategic and fast-evolving execution layers. To stabilize this deep recurrence for language modeling, we introduce MagicNorm and warmup deep credit assignment. Furthermore, instead of standard raw-text pretraining, we train exclusively on instruction-response pairs using a task-completion objective and PrefixLM masking. Serving as an empirical existence proof of efficient pretraining, a 1B-parameter HRM-Text model trained from scratch on only 40 billion unique tokens and $1,500 budget achieves 60.7% on MMLU, 81.9% on ARC-C, 82.2% on DROP, 84.5% on GSM8K, and 56.2% on MATH. Despite utilizing roughly 100-900x fewer training tokens and 96-432x less estimated compute than standard baselines, HRM-Text performs competitively with 2-7B parameter open models. These results demonstrate that co-designing architectures and objectives can radically reduce the compute-to-performance ratio, making pretraining from scratch accessible to the broader research community.
Abstract（参考訳）: 現在の大規模言語モデルの事前学習パラダイムは、大規模な計算とインターネット規模の原文に依存しており、基礎研究にとって大きな障壁となっている。対照的に、生物学的システムは、前頭前頭葉ループの機能的構造のような、多段階の処理を通して、非常にサンプル効率のよい学習を示す。これをインスピレーションとして、標準的なトランスフォーマーを階層的リカレントモデル(HRM)に置き換えるHRM-Textを導入します。言語モデリングにおけるこの深い再帰を安定化するために、MagicNormを導入し、深層信用代入をウォームアップする。さらに、通常の原文事前学習の代わりに、タスク補完目的とPrefixLMマスクを用いて命令応答ペアのみを訓練する。 1BパラメーターのHRM-Textモデルは、400億のユニークなトークンと1500ドルの予算でゼロからトレーニングされ、MMLUで60.7%、ARC-Cで81.9%、DROPで82.2%、GSM8Kで84.5%、MATHで56.2%を達成した。トレーニングトークンは約100-900倍、標準ベースラインより96-432倍少ないが、HRM-Textは2-7Bパラメータオープンモデルと競合する。これらの結果から,共同設計のアーキテクチャと目的は,計算と性能の比率を劇的に削減し,より広い研究コミュニティに事前学習が行えることを示す。

論文の概要: HRM-Text: Efficient Pretraining Beyond Scaling

関連論文リスト