Fugu-MT 論文翻訳(概要): Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework

論文の概要: Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework

arxiv url: http://arxiv.org/abs/2602.20375v1
Date: Mon, 23 Feb 2026 21:25:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.658247
Title: Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework
Title（参考訳）: マルチタスク参照とゴール駆動RLフレームワークによる参照からの一般化
Authors: Jiashun Wang, M. Eva Mungai, He Li, Jean Pierre Sleiman, Jessica Hodgins, Farbod Farshidian,
Abstract要約: 本研究では,人間の動作からヒューマノイドの動作を学習するためのマルチタスク強化学習フレームワークを提案する。単一の目標条件付きポリシーは、同じ観測空間と行動空間を共有する2つのタスクで共同で訓練される。これらの目的を共通の定式化内で協調最適化することにより、このポリシーは、密集した基準監督から構造化された人間のような運動スキルを取得する。
参考スコア（独自算出の注目度）: 12.131501436717969
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning agile humanoid behaviors from human motion offers a powerful route to natural, coordinated control, but existing approaches face a persistent trade-off: reference-tracking policies are often brittle outside the demonstration dataset, while purely task-driven Reinforcement Learning (RL) can achieve adaptability at the cost of motion quality. We introduce a unified multi-task RL framework that bridges this gap by treating reference motion as a prior for behavioral shaping rather than a deployment-time constraint. A single goal-conditioned policy is trained jointly on two tasks that share the same observation and action spaces, but differ in their initialization schemes, command spaces, and reward structures: (i) a reference-guided imitation task in which reference trajectories define dense imitation rewards but are not provided as policy inputs, and (ii) a goal-conditioned generalization task in which goals are sampled independently of any reference and where rewards reflect only task success. By co-optimizing these objectives within a shared formulation, the policy acquires structured, human-like motor skills from dense reference supervision while learning to adapt these skills to novel goals and initial conditions. This is achieved without adversarial objectives, explicit trajectory tracking, phase variables, or reference-dependent inference. We evaluate the method on a challenging box-based parkour playground that demands diverse athletic behaviors (e.g., jumping and climbing), and show that the learned controller transfers beyond the reference distribution while preserving motion naturalness. Finally, we demonstrate long-horizon behavior generation by composing multiple learned skills, illustrating the flexibility of the learned polices in complex scenarios.
Abstract（参考訳）: 人間の動きからアジャイルなヒューマノイドの振る舞いを学ぶことは、自然な調整されたコントロールへの強力な経路を提供するが、既存のアプローチは、永続的なトレードオフに直面している。本稿では,参照動作をデプロイメント時間制約ではなく,行動形成の先行として扱うことで,このギャップを埋める統合マルチタスクRLフレームワークを提案する。単一の目標条件付きポリシーは、同じ観測空間と行動空間を共有する2つのタスクで共同で訓練されるが、初期化スキーム、コマンド空間、報酬構造が異なる。一基準軌道が密集した模倣報酬を定め、かつ、政策の入力として提供しない基準誘導模倣業務二目標条件付き一般化タスクであって、任意の基準から独立してゴールをサンプリングし、報酬がタスクの成功のみを反映するものであること。これらの目標を共通の定式化内で共同最適化することにより、この方針は、厳密な基準監督から構造化された人間のような運動スキルを取得しながら、これらのスキルを新しい目標と初期条件に適応させることを学ぶ。これは、反対の目的、明示的な軌跡追跡、位相変数、参照依存推論なしで達成される。本手法は,多様な運動行動(例えばジャンプやクライミング)を必要とする挑戦的な箱型パールグラウンドで評価し,学習したコントローラが運動の自然性を保ちながら基準分布を超えて移動することを示す。最後に,複数の学習スキルを構成し,複雑なシナリオにおいて学習した警察の柔軟性を示すことによって,長期的行動生成を実証する。

論文の概要: Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework

関連論文リスト