Fugu-MT 論文翻訳(概要): Moving by Looking: Towards Vision-Driven Avatar Motion Generation

論文の概要: Moving by Looking: Towards Vision-Driven Avatar Motion Generation

arxiv url: http://arxiv.org/abs/2509.19259v1
Date: Tue, 23 Sep 2025 17:18:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-24 20:41:27.976164
Title: Moving by Looking: Towards Vision-Driven Avatar Motion Generation
Title（参考訳）: 視線による移動:視覚駆動型アバター運動生成に向けて
Authors: Markos Diomataris, Berat Mert Albaba, Giorgio Becherini, Partha Ghosh, Omid Taheri, Michael J. Black,
Abstract要約: CLOPSは、エゴセントリックな視力だけで周囲を知覚し、移動する最初の人間のアバターである。我々は、大きなモーションキャプチャーデータセット上で、モーション先行モデルをトレーニングする。次に、Qラーニングを用いてポリシーを訓練し、エゴセントリックな視覚入力を前もって動きの高レベル制御コマンドにマッピングする。
参考スコア（独自算出の注目度）: 43.07045613584429
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The way we perceive the world fundamentally shapes how we move, whether it is how we navigate in a room or how we interact with other humans. Current human motion generation methods, neglect this interdependency and use task-specific ``perception'' that differs radically from that of humans. We argue that the generation of human-like avatar behavior requires human-like perception. Consequently, in this work we present CLOPS, the first human avatar that solely uses egocentric vision to perceive its surroundings and navigate. Using vision as the primary driver of motion however, gives rise to a significant challenge for training avatars: existing datasets have either isolated human motion, without the context of a scene, or lack scale. We overcome this challenge by decoupling the learning of low-level motion skills from learning of high-level control that maps visual input to motion. First, we train a motion prior model on a large motion capture dataset. Then, a policy is trained using Q-learning to map egocentric visual inputs to high-level control commands for the motion prior. Our experiments empirically demonstrate that egocentric vision can give rise to human-like motion characteristics in our avatars. For example, the avatars walk such that they avoid obstacles present in their visual field. These findings suggest that equipping avatars with human-like sensors, particularly egocentric vision, holds promise for training avatars that behave like humans.
Abstract（参考訳）: 世界を知覚する方法は、部屋の中をナビゲートする方法や、他の人間との対話の仕方など、基本的に私たちの動きを形作っています。現在の人間の動作生成法では、この相互依存を無視し、人間のものと根本的に異なるタスク固有の「知覚」を使用する。我々は、人間のような行動を生み出すには、人間のような知覚が必要であると論じる。そこで本研究では,エゴセントリックな視覚のみを用いて周囲を知覚し,ナビゲートする最初のヒトアバターであるCLOPSを紹介します。既存のデータセットは、シーンのコンテキストなしに、人間の動きを分離するか、スケールを欠いている。我々は、視覚入力をモーションにマッピングする高レベル制御の学習から、低レベルのモーションスキルの学習を分離することで、この課題を克服する。まず、大きなモーションキャプチャーデータセット上で、モーション先行モデルをトレーニングする。次に、Qラーニングを用いてポリシーを訓練し、エゴセントリックな視覚入力を前もって動きの高レベル制御コマンドにマッピングする。我々の実験は、エゴセントリックな視覚がアバターに人間のような運動特性をもたらすことを実証的に実証した。例えば、アバターは、視覚野に存在する障害物を避けるために歩く。これらの結果は、アバターに人間のようなセンサー、特に自我中心の視覚を装着すると、人間のように振る舞うアバターの訓練が約束されることを示唆している。

論文の概要: Moving by Looking: Towards Vision-Driven Avatar Motion Generation

関連論文リスト