Moving by Looking: Towards Vision-Driven Avatar Motion Generation
- URL: http://arxiv.org/abs/2509.19259v1
- Date: Tue, 23 Sep 2025 17:18:56 GMT
- Title: Moving by Looking: Towards Vision-Driven Avatar Motion Generation
- Authors: Markos Diomataris, Berat Mert Albaba, Giorgio Becherini, Partha Ghosh, Omid Taheri, Michael J. Black,
- Abstract summary: CLOPS is the first human avatar that solely uses egocentric vision to perceive its surroundings and navigate.<n>We train a motion prior model on a large motion capture dataset.<n>Then, a policy is trained using Q-learning to map egocentric visual inputs to high-level control commands for the motion prior.
- Score: 43.07045613584429
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The way we perceive the world fundamentally shapes how we move, whether it is how we navigate in a room or how we interact with other humans. Current human motion generation methods, neglect this interdependency and use task-specific ``perception'' that differs radically from that of humans. We argue that the generation of human-like avatar behavior requires human-like perception. Consequently, in this work we present CLOPS, the first human avatar that solely uses egocentric vision to perceive its surroundings and navigate. Using vision as the primary driver of motion however, gives rise to a significant challenge for training avatars: existing datasets have either isolated human motion, without the context of a scene, or lack scale. We overcome this challenge by decoupling the learning of low-level motion skills from learning of high-level control that maps visual input to motion. First, we train a motion prior model on a large motion capture dataset. Then, a policy is trained using Q-learning to map egocentric visual inputs to high-level control commands for the motion prior. Our experiments empirically demonstrate that egocentric vision can give rise to human-like motion characteristics in our avatars. For example, the avatars walk such that they avoid obstacles present in their visual field. These findings suggest that equipping avatars with human-like sensors, particularly egocentric vision, holds promise for training avatars that behave like humans.
Related papers
- EgoAnimate: Generating Human Animations from Egocentric top-down Views [3.035601871864059]
We introduce a pipeline that generates realistic frontal views from occluded top-down images using ControlNet and a Stable Diffusion backbone.<n>This enables generation of avatar motions from minimal input, paving the way for more accessible and generalizable telepresence systems.
arXiv Detail & Related papers (2025-07-12T09:59:31Z) - Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z) - EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars [56.56236652774294]
We propose a person-specific egocentric telepresence approach, which jointly models the photoreal digital avatar while also driving it from a single egocentric video.
Our experiments demonstrate a clear step towards egocentric and photoreal telepresence as our method outperforms baselines as well as competing methods.
arXiv Detail & Related papers (2024-09-22T22:50:27Z) - Real-Time Simulated Avatar from Head-Mounted Sensors [70.41580295721525]
We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets.
To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement.
When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion.
arXiv Detail & Related papers (2024-03-11T16:15:51Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Physics-based Motion Retargeting from Sparse Inputs [73.94570049637717]
Commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose.
We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies.
We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available.
arXiv Detail & Related papers (2023-07-04T21:57:05Z) - Interaction Replica: Tracking Human-Object Interaction and Scene Changes From Human Motion [48.982957332374866]
Modeling changes caused by humans is essential for building digital twins.
Our method combines visual localization of humans in the scene with contact-based reasoning about human-scene interactions from IMU data.
Our code, data and model are available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/.
arXiv Detail & Related papers (2022-05-05T17:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.