SPACE: Unsupervised Object-Oriented Scene Representation via Spatial
Attention and Decomposition
- URL: http://arxiv.org/abs/2001.02407v3
- Date: Sun, 15 Mar 2020 20:21:38 GMT
- Title: SPACE: Unsupervised Object-Oriented Scene Representation via Spatial
Attention and Decomposition
- Authors: Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam
Singh, Fei Deng, Jindong Jiang, Sungjin Ahn
- Abstract summary: We propose a generative latent variable model, called SPACE, that combines the best of spatial-attention and scene-mixture approaches.
We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS.
- Score: 26.42139271058149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to decompose complex multi-object scenes into meaningful
abstractions like objects is fundamental to achieve higher-level cognition.
Previous approaches for unsupervised object-oriented scene representation
learning are either based on spatial-attention or scene-mixture approaches and
limited in scalability which is a main obstacle towards modeling real-world
scenes. In this paper, we propose a generative latent variable model, called
SPACE, that provides a unified probabilistic modeling framework that combines
the best of spatial-attention and scene-mixture approaches. SPACE can
explicitly provide factorized object representations for foreground objects
while also decomposing background segments of complex morphology. Previous
models are good at either of these, but not both. SPACE also resolves the
scalability problems of previous methods by incorporating parallel
spatial-attention and thus is applicable to scenes with a large number of
objects without performance degradations. We show through experiments on Atari
and 3D-Rooms that SPACE achieves the above properties consistently in
comparison to SPAIR, IODINE, and GENESIS. Results of our experiments can be
found on our project website: https://sites.google.com/view/space-project-page
Related papers
- Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Semantic-guided modeling of spatial relation and object co-occurrence for indoor scene recognition [5.083140094792973]
SpaCoNet simultaneously models Spatial relation and Co-occurrence of objects guided by semantic segmentation.
Experimental results on three widely used scene datasets demonstrate the effectiveness and generality of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:04:22Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - SIMstack: A Generative Shape and Instance Model for Unordered Object
Stacks [38.042876641457255]
We propose a depth-conditioned Variational Auto-Encoder (VAE) trained on a dataset of objects stacked under physics simulation.
We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn't require setting the maximum number of objects in the scene.
Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes.
arXiv Detail & Related papers (2021-03-30T15:42:43Z) - Robust Instance Segmentation through Reasoning about Multi-Object
Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion.
Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders.
In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.