Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction
- URL: http://arxiv.org/abs/2008.00456v1
- Date: Sun, 2 Aug 2020 11:04:49 GMT
- Title: Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction
- Authors: Iman Nematollahi and Oier Mees and Lukas Hermann and Wolfram Burgard
- Abstract summary: Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
- Score: 24.72947291987545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge for an agent learning to interact with the world is to reason
about physical properties of objects and to foresee their dynamics under the
effect of applied forces. In order to scale learning through interaction to
many objects and scenes, robots should be able to improve their own performance
from real-world experience without requiring human supervision. To this end, we
propose a novel approach for modeling the dynamics of a robot's interactions
directly from unlabeled 3D point clouds and images. Unlike previous approaches,
our method does not require ground-truth data associations provided by a
tracker or any pre-trained perception network. To learn from unlabeled
real-world interaction data, we enforce consistency of estimated 3D clouds,
actions and 2D images with observed ones. Our joint forward and inverse network
learns to segment a scene into salient object parts and predicts their 3D
motion under the effect of applied actions. Moreover, our object-centric model
outputs action-conditioned 3D scene flow, object masks and 2D optical flow as
emergent properties. Our extensive evaluation both in simulation and with
real-world data demonstrates that our formulation leads to effective,
interpretable models that can be used for visuomotor control and planning.
Videos, code and dataset are available at http://hind4sight.cs.uni-freiburg.de
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing [38.97168020979433]
We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model.
Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states.
We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks.
arXiv Detail & Related papers (2024-07-01T16:08:37Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - 3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators [24.181604511269096]
We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-in 3D neural scene representation space.
In this space, objects do not interfere with one another and their appearance persists over time and across viewpoints.
We show our model generalizes well its predictions across varying number and appearances of interacting objects as well as across camera viewpoints.
arXiv Detail & Related papers (2020-11-12T16:15:52Z) - Learning 3D Dynamic Scene Representations for Robot Manipulation [21.6131570689398]
3D scene representation for robot manipulation should capture three key object properties: permanency, completeness, and continuity.
We introduce 3D Dynamic Representation (DSR), a 3D scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics.
We propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR.
arXiv Detail & Related papers (2020-11-03T19:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.