Unsupervised Discovery of 3D Physical Objects from Video
- URL: http://arxiv.org/abs/2007.12348v3
- Date: Tue, 23 Mar 2021 02:03:08 GMT
- Title: Unsupervised Discovery of 3D Physical Objects from Video
- Authors: Yilun Du, Kevin Smith, Tomer Ulman, Joshua Tenenbaum, Jiajun Wu
- Abstract summary: We explore how physics, especially object interactions, facilitates disentangling of 3D geometry and position of objects from video, in an unsupervised manner.
Our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes.
- Score: 15.939924306990548
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We study the problem of unsupervised physical object discovery. While
existing frameworks aim to decompose scenes into 2D segments based off each
object's appearance, we explore how physics, especially object interactions,
facilitates disentangling of 3D geometry and position of objects from video, in
an unsupervised manner. Drawing inspiration from developmental psychology, our
Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues
and physical motion cues to accurately segment observable and partially
occluded objects of varying sizes, and infer properties of those objects. Our
model reliably segments objects on both synthetic and real scenes. The
discovered object properties can also be used to reason about physical events.
Related papers
- Unsupervised Discovery of Object-Centric Neural Fields [21.223170092979498]
We study inferring 3D object-centric scene representations from a single image.
We propose Unsupervised discovery of Object-Centric neural Fields (uOCF)
arXiv Detail & Related papers (2024-02-12T02:16:59Z) - Seeing Objects in a Cluttered World: Computational Objectness from
Motion in Video [0.0]
Perception of the visually disjoint surfaces of our world as whole objects physically distinct from those overlapping them forms the basis of our visual perception.
We present a simple but novel approach to infer objectness from phenomenology without object models.
We show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake.
arXiv Detail & Related papers (2024-02-02T03:57:11Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Object Wake-up: 3-D Object Reconstruction, Animation, and in-situ
Rendering from a Single Image [58.69732754597448]
Given a picture of a chair, could we extract the 3-D shape of the chair, animate its plausible articulations and motions, and render in-situ in its original image space?
We devise an automated approach to extract and manipulate articulated objects in single images.
arXiv Detail & Related papers (2021-08-05T16:20:12Z) - Discovering 3D Parts from Image Collections [98.16987919686709]
We tackle the problem of 3D part discovery from only 2D image collections.
Instead of relying on manually annotated parts for supervision, we propose a self-supervised approach.
Our key insight is to learn a novel part shape prior that allows each part to fit an object shape faithfully while constrained to have simple geometry.
arXiv Detail & Related papers (2021-07-28T20:29:16Z) - Object Properties Inferring from and Transfer for Human Interaction
Motions [51.896592493436984]
In this paper, we present a fine-grained action recognition method that learns to infer object properties from human interaction motion alone.
We collect a large number of videos and 3D skeletal motions of the performing actors using an inertial motion capture device.
In particular, we learn to identify the interacting object, by estimating its weight, or its fragility or delicacy.
arXiv Detail & Related papers (2020-08-20T14:36:34Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.