Fugu-MT 論文翻訳(概要): ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

論文の概要: ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

arxiv url: http://arxiv.org/abs/2605.19503v1
Date: Tue, 19 May 2026 07:54:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.195573
Title: ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
Title（参考訳）: ARC-RL:ARCレイダーに触発された強化学習プレイグラウンド
Authors: Carlo Romeo, Andrew D. Bagdanov,
Abstract要約: ARC-RL(ARC-RL)は、ARCレイダーに触発されたロボット形態を特徴とする4つの連続制御環境のスイートである。 4つのロボットは、統一された観察テンプレート、アクションコンベンション、シミュレーションケイデンス、および単一のクローズドフォームマルチコンポーネント報酬関数を共有している。報酬は、ベロシティ追跡テント、健康的な生存ボーナス、フェーズロックされた歩行順応ボーナス/コストペア、アクションレギュラー、3つのセーフティペナルティ、姿勢アンカーを融合させる。
参考スコア（独自算出の注目度）: 11.905134977931075
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor; no motion-capture data enters the reward at any point. We additionally provide hand-crafted Central Pattern Generator demonstrators per morphology, which serve both as fixed expert references and as sources of prior data for offline-to-online training. On this playground, we conduct a controlled empirical study comparing standard online algorithms (SAC, SPEQ, SOPE-EO) and methods augmented with prior data (SACfD, SPEQ-O2O, SOPE), and characterise how each paradigm copes with the playground's morphological diversity and animation-style stylistic constraints.
Abstract（参考訳）: 脚運動の強化学習は、実際の商用ハードウェアから一様に派生した多成分報酬関数と物理エンジンベンチマークのスタックに成熟した。しかし、ゲームNPCは、シモンからリアルへのロボット工学が欠如しているスタイル上の制約に縛られ、実際のロボットとは無関係な生物の形を常用している。 ARC-RLは、ARCレイダーの傑作である18-DoFのヘキサポッドクイーン、12-DoFの装甲ヘキサポッド大隊、18-DoFのコンパクトヘキサポッドティック、12-DoFの四足歩行プルーパーにインスパイアされたロボット形態を特徴とする4つのMuJoCo連続制御環境のスイートである。 4つのロボットはいずれも、統一された観察テンプレート、アクションコンベンション、シミュレーションケイデンス、および1つの閉形式の多成分報酬関数を共有している。報酬は、速度追跡テント、健康的生存ボーナス、位相ロックされた歩行順応ボーナス/コストペア、アクションレギュラー、3つの安全罰、姿勢アンカーを融合させる。また,手作りのCentral Pattern Generatorデーモンストレータを定型的な専門家参照として,オフライン-オンライントレーニングのための事前データのソースとして提供する。この遊び場では,標準オンラインアルゴリズム(SAC, SPEQ, SOPE-EO)と事前データ(SACfD, SPEQ-O2O, SOPE)を付加した手法を比較し,各パラダイムが遊技場の形態的多様性やアニメーションスタイルの制約にどのように対処するかを特徴付ける。

論文の概要: ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

関連論文リスト