Learning to Identify Physical Parameters from Video Using Differentiable
Physics
- URL: http://arxiv.org/abs/2009.08292v1
- Date: Thu, 17 Sep 2020 13:36:57 GMT
- Title: Learning to Identify Physical Parameters from Video Using Differentiable
Physics
- Authors: Rama Krishna Kandukuri, Jan Achterhold, Michael M\"oller, J\"org
St\"uckler
- Abstract summary: We propose a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation.
We demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences.
- Score: 2.15242029196761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video representation learning has recently attracted attention in computer
vision due to its applications for activity and scene forecasting or
vision-based planning and control. Video prediction models often learn a latent
representation of video which is encoded from input frames and decoded back
into images. Even when conditioned on actions, purely deep learning based
architectures typically lack a physically interpretable latent space. In this
study, we use a differentiable physics engine within an action-conditional
video representation network to learn a physical latent representation. We
propose supervised and self-supervised learning methods to train our network
and identify physical properties. The latter uses spatial transformers to
decode physical states back into images. The simulation scenarios in our
experiments comprise pushing, sliding and colliding objects, for which we also
analyze the observability of the physical properties. In experiments we
demonstrate that our network can learn to encode images and identify physical
properties like mass and friction from videos and action sequences in the
simulated scenarios. We evaluate the accuracy of our supervised and
self-supervised methods and compare it with a system identification baseline
which directly learns from state trajectories. We also demonstrate the ability
of our method to predict future video frames from input images and actions.
Related papers
- Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Learning to See Physical Properties with Active Sensing Motor Policies [20.851419392513503]
We present a method that overcomes the challenge of building a vision system that takes as input the observed terrain and predicts physical properties.
We introduce Active Sensing Motor Policies (ASMP), which are trained to explore locomotion behaviors that increase the accuracy of estimating physical parameters.
The trained system is robust and works even with overhead images captured by a drone despite being trained on data collected by cameras attached to a quadruped robot walking on the ground.
arXiv Detail & Related papers (2023-11-02T17:19:18Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Cross-Identity Motion Transfer for Arbitrary Objects through
Pose-Attentive Video Reassembling [40.20163225821707]
Given a source image and a driving video, our networks animate the subject in the source images according to the motion in the driving video.
In our attention mechanism, dense similarities between the learned keypoints in the source and the driving images are computed.
To reduce the training-testing discrepancy of the self-supervised learning, a novel cross-identity training scheme is additionally introduced.
arXiv Detail & Related papers (2020-07-17T07:21:12Z) - Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics [33.30312206728974]
We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks.
Our approach arranges object meshes in physically realistic, dense scenes using physics simulation.
Our pipeline can be run online during training of a deep neural network.
arXiv Detail & Related papers (2020-05-12T10:11:00Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.