Fugu-MT 論文翻訳(概要): CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

論文の概要: CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

arxiv url: http://arxiv.org/abs/2605.28919v1
Date: Wed, 27 May 2026 17:59:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:55.051462
Title: CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models
Title（参考訳）: CosmicFish-HRM:コンパクト言語モデルにおける階層的リカレントメカニズムによる適応推論
Authors: Venkat Akhil Lakkapragada,
Abstract要約: 本稿では適応推論深度を持つコンパクト言語モデルであるCosmicFish-HRMを提案する。モデルは非一様推論の振る舞いを学習し、タスクと入力をまたいだ推論ステップの数を割り振る。これらの結果から,適応推論深度は,推論能力のパラメータスケールにのみ依存するよりも,有望な代替となる可能性が示唆された。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have achieved strong reasoning capabilities, though often at the cost of massive parameter counts and expensive inference. In this work, we explore a different direction: adaptive reasoning depth in compact language models. We present CosmicFish-HRM, a compact language model built around a Hierarchical Reasoning Module (HRM) that dynamically allocates computational effort during inference. Instead of applying fixed computation to every input, the model iterates through high-level and low-level reasoning cycles and learns when to halt based on input complexity. CosmicFish-HRM combines this adaptive reasoning core with modern transformer components including Grouped Query Attention, RoPE, and SwiGLU activations. While the additional reasoning infrastructure introduces overhead at small scale, we hypothesize that this tradeoff becomes increasingly favorable as model size grows and the relative cost of the HRM core diminishes. Our results show that the model learns non-uniform reasoning behavior, allocating different numbers of reasoning steps across tasks and inputs. These findings suggest that adaptive reasoning depth may offer a promising alternative to relying solely on parameter scale for reasoning capability.
Abstract（参考訳）: 大規模言語モデルは、大きなパラメータ数と高価な推論を犠牲にして、強力な推論能力を達成した。本研究では,コンパクト言語モデルにおける適応推論深度という,異なる方向を探索する。提案するCosmicFish-HRMは,階層型推論モジュール(HRM)を中心に構築された,推論中の計算作業を動的に割り当てるコンパクト言語モデルである。すべての入力に固定計算を適用する代わりに、モデルは高レベルかつ低レベルな推論サイクルを反復し、入力複雑性に基づいていつ停止するかを学ぶ。 CosmicFish-HRMはこの適応推論コアと、Grouped Query Attention、RoPE、SwiGLUアクティベーションなどのモダンなトランスフォーマーコンポーネントを組み合わせる。追加の推論インフラは、小規模でオーバーヘッドを発生させるが、モデルのサイズが増大し、HRMコアの相対コストが減少するにつれて、このトレードオフはますます好まれると仮定する。本結果は,タスクや入力にまたがる推論ステップの数が異なることによって,一様でない推論動作を学習することを示す。これらの結果から,適応推論深度は,推論能力のパラメータスケールにのみ依存するよりも,有望な代替となる可能性が示唆された。

関連論文リスト

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models [47.195670444638715]
トークンエントロピーとロジット勾配の堅牢な相関関係であるtextbfEntropy-Gradient Inversion を同定し,正式に定義する。我々は,この逆シグネチャを強化学習に組み込んだtextbfCorrelation-Regularized Group Policy Optimization (CorR-PO)を提案する。複数のモデルスケールにわたる様々な推論ベンチマークの実験は、Corr-POが最先端のベースラインを一貫して上回っていることを示している。
論文参考訳（メタデータ） (2026-05-18T02:41:53Z)
CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization [54.774620283208776]
大きな推論モデル(LRM)は、複雑な問題に対処するための長いチェーン・オブ・シント(CoT)推論を強調する。既存の方法は、CoTトレースから望ましくない知識を完全に排除するか、推論プロセスへの干渉によって推論性能を低下させるかのいずれかである。 LRMにおけるCoT推論の目的的介入として、非学習を再定義する新しいフレームワークである反復的選好最適化(CiPO)を通じて、対実的アンラーニングを導入する。
論文参考訳（メタデータ） (2026-04-17T08:56:36Z)
To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks [56.11584171938381]
心の理論 (ToM) は、モデルが信念、欲望、意図などの隠された精神状態を推測できるかどうかを評価する。近年のLRM(Large Reasoning Models)の進歩により、数学やコーディングにおけるステップバイステップ推論が向上している。本研究では,9つの大規模言語モデル(LLM)の体系的研究を行い,推論モデルと非推論モデルを比較した。
論文参考訳（メタデータ） (2026-02-11T08:16:13Z)
Reasoning Pattern Alignment Merging for Adaptive Reasoning [48.347817456299104]
Reasoning Pattern Alignment Merging (RPAM) RPAMは、クエリ適応推論を容易にする機能アライメントに基づく階層的なモデルマージフレームワークである。広く使用されている7つの推論ベンチマークの実験により、RPAMは強い性能を維持しながら推論コストを大幅に削減することが示された。
論文参考訳（メタデータ） (2026-01-07T01:36:39Z)
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling [23.919163488129985]
Structure Reward Model (SRM) はサイドブランチと補助機能ジェネレータを統合したモジュラーフレームワークである。粒度の細かい寸法を導入することで、RMは解釈可能で効率的な評価、ターゲット診断、最適化を可能にします。
論文参考訳（メタデータ） (2025-09-29T18:09:25Z)
Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism [68.05754701230039]
本研究では,トランスフォーマーモデルにおける情報伝達機構を解明するために,シンボル的多段階推論タスクを構築する。モデルの推論能力を高めるために,ランダムな行列に基づくアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-05-24T07:41:26Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。