Related papers: Learning to Identify Physical Parameters from Video Using Differentiable Physics

Learning to Identify Physical Parameters from Video Using Differentiable Physics

URL: http://arxiv.org/abs/2009.08292v1
Date: Thu, 17 Sep 2020 13:36:57 GMT
Title: Learning to Identify Physical Parameters from Video Using Differentiable Physics
Authors: Rama Krishna Kandukuri, Jan Achterhold, Michael M\"oller, J\"org St\"uckler
Abstract summary: We propose a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation. We demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences.
Score: 2.15242029196761
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation of video which is encoded from input frames and decoded back into images. Even when conditioned on actions, purely deep learning based architectures typically lack a physically interpretable latent space. In this study, we use a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation. We propose supervised and self-supervised learning methods to train our network and identify physical properties. The latter uses spatial transformers to decode physical states back into images. The simulation scenarios in our experiments comprise pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. In experiments we demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences in the simulated scenarios. We evaluate the accuracy of our supervised and self-supervised methods and compare it with a system identification baseline which directly learns from state trajectories. We also demonstrate the ability of our method to predict future video frames from input images and actions.

Related papers

Intuitive physics understanding emerges from self-supervised pretraining on natural videos [39.030105916720835]
We investigate the emergence of intuitive physics understanding in deep neural network models trained to predict masked regions in natural videos. We find that video prediction models trained to predict outcomes in a learned representation space demonstrate an understanding of various intuitive physics properties.
arXiv Detail & Related papers (2025-02-17T14:27:14Z)
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method. It produces a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z)
Video-Driven Graph Network-Based Simulators [7.687678490751104]
This paper presents a method that can infer a system's physical properties from a short video. The learned representation is then used within a Graph Network-based Simulator to emulate the trajectories of physical systems. We demonstrate that the video-derived encodings effectively capture the physical properties of the system and showcase a linear dependence between some of the encodings and the system's motion.
arXiv Detail & Related papers (2024-09-10T07:04:48Z)
Identifying Terrain Physical Parameters from Vision -- Towards Physical-Parameter-Aware Locomotion and Navigation [33.10872127224328]
We propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation. We train a physical decoder in simulation to predict friction and stiffness from multi-modal input. The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment.
arXiv Detail & Related papers (2024-08-29T14:35:14Z)
Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation. It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z)
Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past. We leverage the large-scale pretraining of image diffusion models which can handle multi-modality. We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z)
Learning to See Physical Properties with Active Sensing Motor Policies [20.851419392513503]
We present a method that overcomes the challenge of building a vision system that takes as input the observed terrain and predicts physical properties. We introduce Active Sensing Motor Policies (ASMP), which are trained to explore locomotion behaviors that increase the accuracy of estimating physical parameters. The trained system is robust and works even with overhead images captured by a drone despite being trained on data collected by cameras attached to a quadruped robot walking on the ground.
arXiv Detail & Related papers (2023-11-02T17:19:18Z)
3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids. We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z)
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language. This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)
Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.