Fugu-MT 論文翻訳(概要): Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

論文の概要: Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

arxiv url: http://arxiv.org/abs/2604.17807v1
Date: Mon, 20 Apr 2026 04:59:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.698546
Title: Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement
Title（参考訳）: Re$2$MoGen: LLM推論と物理認識によるオープン語彙運動生成
Authors: Jiakun Zheng, Ting Xiao, Shiqin Cao, Xinran Li, Zhe Wang, Chenjia Bai,
Abstract要約: Re$2$MoGenはReasoning and Refinement Open-vocabulary Motion Generationフレームワークである。初期動作計画を生成し、強化学習(RL)後トレーニングを通じて身体的可視性を洗練させる。我々のフレームワークは意味論的に一貫性があり、物理的に妥当な動きを生成し、オープン語彙の動作生成において最先端のパフォーマンスを達成することができる。
参考スコア（独自算出の注目度）: 27.84741874985021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-motion (T2M) generation aims to control the behavior of a target character via textual descriptions. Leveraging text-motion paired datasets, existing T2M models have achieved impressive performance in generating high-quality motions within the distribution of their training data. However, their performance deteriorates notably when the motion descriptions differ significantly from the training texts. To address this issue, we propose Re$^2$MoGen, a Reasoning and Refinement open-vocabulary Motion Generation framework that leverages enhanced Large Language Model (LLM) reasoning to generate an initial motion planning and then refine its physical plausibility via reinforcement learning (RL) post-training. Specifically, Re$^2$MoGen consists of three stages: We first employ Monte Carlo tree search to enhance the LLM's reasoning ability in generating reasonable keyframes of the motion based on text prompts, specifying only the root and several key joints' positions to ease the reasoning process. Then, we apply a human pose model as a prior to optimize the full-body poses based on the planned keyframes and use the resulting incomplete motion to supervise fine-tuning a pre-trained motion generator via a dynamic temporal matching objective, enabling spatiotemporal completion. Finally, we use post-training with physics-aware reward to refine motion quality to eliminate physical implausibility in LLM-planned motions. Extensive experiments demonstrate that our framework can generate semantically consistent and physically plausible motions and achieve state-of-the-art performance in open-vocabulary motion generation.
Abstract（参考訳）: テキスト・トゥ・モーション(T2M)生成は、テキスト記述を通じて対象文字の振る舞いを制御することを目的としている。テキストモーションペアデータセットを活用することで、既存のT2Mモデルは、トレーニングデータの分散内で高品質なモーションを生成することで、優れたパフォーマンスを実現している。しかし,動作記述がトレーニングテキストと大きく異なる場合には,その性能は著しく低下する。この問題を解決するためにRe$^2$MoGenを提案する。Reasoning and Refinement Open-vocabulary Motion Generation frameworkは、拡張されたLarge Language Model(LLM)推論を利用して初期動作計画を生成し、強化学習(RL)後トレーニングによってその物理的妥当性を向上する。特に、Re$^2$MoGenは3つの段階から構成される: 私たちはまずモンテカルロ木探索を用いて、テキストプロンプトに基づいて動きの合理的なキーフレームを生成し、ルートといくつかのキージョイントの位置のみを指定することで、推論プロセスの容易化を図る。次に,人間のポーズモデルを用いて,計画されたキーフレームに基づいて全身のポーズを最適化し,結果として得られた不完全動作を用いて,動的時間的マッチング目的を介し事前学習したモーションジェネレータの微調整を監督し,時空間的補完を可能にする。最後に,LLM計画運動における身体的不確実性を排除するために,物理認識報酬を用いたポストトレーニングを用いて運動品質を改良する。大規模な実験により,我々のフレームワークは意味論的に一貫した物理的に妥当な動作を生成でき,オープン語彙運動生成における最先端の性能を実現することができることが示された。

論文の概要: Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

関連論文リスト