Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel
Space
- URL: http://arxiv.org/abs/2202.00368v1
- Date: Tue, 1 Feb 2022 12:18:30 GMT
- Title: Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel
Space
- Authors: Steeven Janny, Fabien Baradel, Natalia Neverova, Madiha Nadri, Greg
Mori, Christian Wolf
- Abstract summary: We present a method for learning causal relationships in high-dimensional data (images, videos)
Our method does not require the knowledge or supervision of any ground truth positions or other object or scene properties.
We introduce a new challenging and carefully designed counterfactual benchmark for predictions in pixel space.
- Score: 43.654464513994164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning causal relationships in high-dimensional data (images, videos) is a
hard task, as they are often defined on low dimensional manifolds and must be
extracted from complex signals dominated by appearance, lighting, textures and
also spurious correlations in the data. We present a method for learning
counterfactual reasoning of physical processes in pixel space, which requires
the prediction of the impact of interventions on initial conditions. Going
beyond the identification of structural relationships, we deal with the
challenging problem of forecasting raw video over long horizons. Our method
does not require the knowledge or supervision of any ground truth positions or
other object or scene properties. Our model learns and acts on a suitable
hybrid latent representation based on a combination of dense features, sets of
2D keypoints and an additional latent vector per keypoint. We show that this
better captures the dynamics of physical processes than purely dense or sparse
representations. We introduce a new challenging and carefully designed
counterfactual benchmark for predictions in pixel space and outperform strong
baselines in physics-inspired ML and video prediction.
Related papers
- 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Entropy-driven Unsupervised Keypoint Representation Learning in Videos [7.940371647421243]
We present a novel approach for unsupervised learning of meaningful representations from videos.
We argue that textitlocal entropy of pixel neighborhoods and their temporal evolution create valuable intrinsic supervisory signals for learning prominent features.
Our empirical results show superior performance for our information-driven keypoints that resolve challenges like attendance to static and dynamic objects or objects abruptly entering and leaving the scene.
arXiv Detail & Related papers (2022-09-30T12:03:52Z) - Spatio-Temporal Relation Learning for Video Anomaly Detection [35.59510027883497]
Anomaly identification is highly dependent on the relationship between the object and the scene.
In this paper, we propose a Spatial-Temporal Relation Learning framework to tackle the video anomaly detection task.
Experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2022-09-27T02:19:31Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Towards an Interpretable Latent Space in Structured Models for Video
Prediction [30.080907495461876]
We focus on the task of future frame prediction in video governed by underlying physical dynamics.
We work with models which are object-centric, i.e., explicitly work with object representations, and propagate a loss in the latent space.
arXiv Detail & Related papers (2021-07-16T05:37:16Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.