Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose
Estimation
- URL: http://arxiv.org/abs/2011.04837v3
- Date: Wed, 9 Dec 2020 03:11:03 GMT
- Title: Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose
Estimation
- Authors: Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Shun Iwase, Kris M. Kitani
- Abstract summary: We propose a method for incorporating object interaction and human body dynamics into the task of 3D ego-pose estimation.
We use a kinematics model of the human body to represent the entire range of human motion, and a dynamics model of the body to interact with objects inside a physics simulator.
This is the first work to estimate a physically valid 3D full-body interaction sequence with objects from egocentric videos.
- Score: 25.03715978502528
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a method for incorporating object interaction and human body
dynamics into the task of 3D ego-pose estimation using a head-mounted camera.
We use a kinematics model of the human body to represent the entire range of
human motion, and a dynamics model of the body to interact with objects inside
a physics simulator. By bringing together object modeling, kinematics modeling,
and dynamics modeling in a reinforcement learning (RL) framework, we enable
object-aware 3D ego-pose estimation. We devise several representational
innovations through the design of the state and action space to incorporate 3D
scene context and improve pose estimation quality. We also construct a
fine-tuning step to correct the drift and refine the estimated human-object
interaction. This is the first work to estimate a physically valid 3D full-body
interaction sequence with objects (e.g., chairs, boxes, obstacles) from
egocentric videos. Experiments with both controlled and in-the-wild settings
show that our method can successfully extract an object-conditioned 3D ego-pose
sequence that is consistent with the laws of physics.
Related papers
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation [23.603254270514224]
We propose a method for object-aware 3D egocentric pose estimation that tightly integrates kinematics modeling, dynamics modeling, and scene object information.
We demonstrate for the first time, the ability to estimate physically-plausible 3D human-object interactions using a single wearable camera.
arXiv Detail & Related papers (2021-06-10T17:59:50Z) - Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.