Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation
- URL: http://arxiv.org/abs/2106.05969v1
- Date: Thu, 10 Jun 2021 17:59:50 GMT
- Title: Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation
- Authors: Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani
- Abstract summary: We propose a method for object-aware 3D egocentric pose estimation that tightly integrates kinematics modeling, dynamics modeling, and scene object information.
We demonstrate for the first time, the ability to estimate physically-plausible 3D human-object interactions using a single wearable camera.
- Score: 23.603254270514224
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a method for object-aware 3D egocentric pose estimation that
tightly integrates kinematics modeling, dynamics modeling, and scene object
information. Unlike prior kinematics or dynamics-based approaches where the two
components are used disjointly, we synergize the two approaches via
dynamics-regulated training. At each timestep, a kinematic model is used to
provide a target pose using video evidence and simulation state. Then, a
prelearned dynamics model attempts to mimic the kinematic pose in a physics
simulator. By comparing the pose instructed by the kinematic model against the
pose generated by the dynamics model, we can use their misalignment to further
improve the kinematic model. By factoring in the 6DoF pose of objects (e.g.,
chairs, boxes) in the scene, we demonstrate for the first time, the ability to
estimate physically-plausible 3D human-object interactions using a single
wearable camera. We evaluate our egocentric pose estimation method in both
controlled laboratory settings and real-world scenarios.
Related papers
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - D&D: Learning Human Dynamics from Dynamic Camera [55.60512353465175]
We present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera.
Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines.
arXiv Detail & Related papers (2022-09-19T06:51:02Z) - Trajectory Optimization for Physics-Based Reconstruction of 3d Human
Pose from Monocular Video [31.96672354594643]
We focus on the task of estimating a physically plausible articulated human motion from monocular video.
Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts.
We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark.
arXiv Detail & Related papers (2022-05-24T18:02:49Z) - Differentiable Dynamics for Articulated 3d Human Motion Reconstruction [29.683633237503116]
We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video.
We validate the model by demonstrating that it can accurately reconstruct physically plausible 3d human motion from monocular video.
arXiv Detail & Related papers (2022-05-24T17:58:37Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - Learning Local Recurrent Models for Human Mesh Recovery [50.85467243778406]
We present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model.
We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body.
This results in a structure-informed local recurrent learning architecture that can be trained in an end-to-end fashion with available annotations.
arXiv Detail & Related papers (2021-07-27T14:30:33Z) - SimPoE: Simulated Character Control for 3D Human Pose Estimation [33.194787030240825]
SimPoE is a Simulation-based approach for 3D human Pose Estimation.
It integrates image-based kinematic inference and physics-based dynamics modeling.
Our approach establishes the new state of the art in pose accuracy while ensuring physical plausibility.
arXiv Detail & Related papers (2021-04-01T17:59:50Z) - Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose
Estimation [25.03715978502528]
We propose a method for incorporating object interaction and human body dynamics into the task of 3D ego-pose estimation.
We use a kinematics model of the human body to represent the entire range of human motion, and a dynamics model of the body to interact with objects inside a physics simulator.
This is the first work to estimate a physically valid 3D full-body interaction sequence with objects from egocentric videos.
arXiv Detail & Related papers (2020-11-10T00:06:43Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.