Fugu-MT 論文翻訳(概要): SMooGPT: Stylized Motion Generation using Large Language Models

論文の概要: SMooGPT: Stylized Motion Generation using Large Language Models

arxiv url: http://arxiv.org/abs/2509.04058v1
Date: Thu, 04 Sep 2025 09:41:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:10.119523
Title: SMooGPT: Stylized Motion Generation using Large Language Models
Title（参考訳）: SMooGPT:大規模言語モデルを用いたスティル化運動生成
Authors: Lei Zhong, Yi Yang, Changjian Li,
Abstract要約: スティル化運動生成はコンピュータグラフィックスにおいて活発に研究されており、特に拡散モデルの急速な進歩の恩恵を受けている。既存の研究では、動作スタイルの転送や条件付き動作生成によってこの問題に対処しようとしている。本稿では,身体部分のテキスト空間を中間表現として利用し,SMooGPTを提案する。
参考スコア（独自算出の注目度）: 23.476473154719514
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Stylized motion generation is actively studied in computer graphics, especially benefiting from the rapid advances in diffusion models. The goal of this task is to produce a novel motion respecting both the motion content and the desired motion style, e.g., ``walking in a loop like a Monkey''. Existing research attempts to address this problem via motion style transfer or conditional motion generation. They typically embed the motion style into a latent space and guide the motion implicitly in a latent space as well. Despite the progress, their methods suffer from low interpretability and control, limited generalization to new styles, and fail to produce motions other than ``walking'' due to the strong bias in the public stylization dataset. In this paper, we propose to solve the stylized motion generation problem from a new perspective of reasoning-composition-generation, based on our observations: i) human motion can often be effectively described using natural language in a body-part centric manner, ii) LLMs exhibit a strong ability to understand and reason about human motion, and iii) human motion has an inherently compositional nature, facilitating the new motion content or style generation via effective recomposing. We thus propose utilizing body-part text space as an intermediate representation, and present SMooGPT, a fine-tuned LLM, acting as a reasoner, composer, and generator when generating the desired stylized motion. Our method executes in the body-part text space with much higher interpretability, enabling fine-grained motion control, effectively resolving potential conflicts between motion content and style, and generalizes well to new styles thanks to the open-vocabulary ability of LLMs. Comprehensive experiments and evaluations, and a user perceptual study, demonstrate the effectiveness of our approach, especially under the pure text-driven stylized motion generation.
Abstract（参考訳）: スティル化運動生成はコンピュータグラフィックスにおいて活発に研究されており、特に拡散モデルの急速な進歩の恩恵を受けている。この課題の目標は、動作内容と所望の動作スタイルの両方を尊重する新しい動作、例えば、'`walking in a loop like a Monkey''を作成することである。既存の研究では、動作スタイルの転送や条件付き動作生成によってこの問題に対処しようとしている。彼らは通常、動きのスタイルを潜伏空間に埋め込んで、潜伏空間でも暗黙的に動きを導く。進歩にもかかわらず、それらの手法は低い解釈可能性と制御に悩まされ、新しいスタイルへの一般化が制限され、パブリックスタイリングデータセットの強いバイアスのために「ウォーキング」以外の動きを生成できない。本稿では,我々の観察に基づく推論・合成・生成の新しい視点から,スタイリングされた動作生成問題を解くことを提案する。一人の動きは、身体部分中心の方法で自然言語を用いて効果的に記述することができること。二 LLMは、人間の動作について理解し、推論する強力な能力を示し、三人間の動きは、本質的に構成性があり、効果的に再合成することにより、新しい動きの内容又はスタイル生成を容易にする。そこで本研究では,身体部分のテキスト空間を中間表現として利用し,所望のスタイル化動作を生成する際に,推論,作曲家,ジェネレータとして機能する微調整LDMであるSMooGPTを提案する。動作内容とスタイル間の潜在的な衝突を効果的に解消し,LLMのオープン語彙能力によって新たなスタイルによく適応する。総合的な実験と評価,およびユーザパースペクティブスタディは,本手法の有効性を実証する。

論文の概要: SMooGPT: Stylized Motion Generation using Large Language Models

関連論文リスト