Learning to Identify Physical Parameters from Video Using Differentiable
Physics
- URL: http://arxiv.org/abs/2009.08292v1
- Date: Thu, 17 Sep 2020 13:36:57 GMT
- Title: Learning to Identify Physical Parameters from Video Using Differentiable
Physics
- Authors: Rama Krishna Kandukuri, Jan Achterhold, Michael M\"oller, J\"org
St\"uckler
- Abstract summary: We propose a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation.
We demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences.
- Score: 2.15242029196761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video representation learning has recently attracted attention in computer
vision due to its applications for activity and scene forecasting or
vision-based planning and control. Video prediction models often learn a latent
representation of video which is encoded from input frames and decoded back
into images. Even when conditioned on actions, purely deep learning based
architectures typically lack a physically interpretable latent space. In this
study, we use a differentiable physics engine within an action-conditional
video representation network to learn a physical latent representation. We
propose supervised and self-supervised learning methods to train our network
and identify physical properties. The latter uses spatial transformers to
decode physical states back into images. The simulation scenarios in our
experiments comprise pushing, sliding and colliding objects, for which we also
analyze the observability of the physical properties. In experiments we
demonstrate that our network can learn to encode images and identify physical
properties like mass and friction from videos and action sequences in the
simulated scenarios. We evaluate the accuracy of our supervised and
self-supervised methods and compare it with a system identification baseline
which directly learns from state trajectories. We also demonstrate the ability
of our method to predict future video frames from input images and actions.
Related papers
- PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - Video-Driven Graph Network-Based Simulators [7.687678490751104]
This paper presents a method that can infer a system's physical properties from a short video.
The learned representation is then used within a Graph Network-based Simulator to emulate the trajectories of physical systems.
We demonstrate that the video-derived encodings effectively capture the physical properties of the system and showcase a linear dependence between some of the encodings and the system's motion.
arXiv Detail & Related papers (2024-09-10T07:04:48Z) - Identifying Terrain Physical Parameters from Vision -- Towards Physical-Parameter-Aware Locomotion and Navigation [33.10872127224328]
We propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation.
We train a physical decoder in simulation to predict friction and stiffness from multi-modal input.
The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment.
arXiv Detail & Related papers (2024-08-29T14:35:14Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Learning to See Physical Properties with Active Sensing Motor Policies [20.851419392513503]
We present a method that overcomes the challenge of building a vision system that takes as input the observed terrain and predicts physical properties.
We introduce Active Sensing Motor Policies (ASMP), which are trained to explore locomotion behaviors that increase the accuracy of estimating physical parameters.
The trained system is robust and works even with overhead images captured by a drone despite being trained on data collected by cameras attached to a quadruped robot walking on the ground.
arXiv Detail & Related papers (2023-11-02T17:19:18Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.