Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
- URL: http://arxiv.org/abs/2104.05367v1
- Date: Mon, 12 Apr 2021 11:37:23 GMT
- Title: Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
- Authors: Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai
- Abstract summary: Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world.
In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.
- Score: 57.088328223220934
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing scene understanding systems mainly focus on recognizing the visible
parts of a scene, ignoring the intact appearance of physical objects in the
real-world. Concurrently, image completion has aimed to create plausible
appearance for the invisible regions, but requires a manual mask as input. In
this work, we propose a higher-level scene understanding system to tackle both
visible and invisible parts of objects and backgrounds in a given scene.
Particularly, we built a system to decompose a scene into individual objects,
infer their underlying occlusion relationships, and even automatically learn
which parts of the objects are occluded that need to be completed. In order to
disentangle the occluded relationships of all objects in a complex scene, we
use the fact that the front object without being occluded is easy to be
identified, detected, and segmented. Our system interleaves the two tasks of
instance segmentation and scene completion through multiple iterations, solving
for objects layer-by-layer. We first provide a thorough experiment using a new
realistically rendered dataset with ground-truths for all invisible regions. To
bridge the domain gap to real imagery where ground-truths are unavailable, we
then train another model with the pseudo-ground-truths generated from our
trained synthesis model. We demonstrate results on a wide variety of datasets
and show significant improvement over the state-of-the-art.
Related papers
- In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation [50.79940712523551]
We present lazy visual grounding, a two-stage approach of unsupervised object mask discovery followed by object grounding.
Our model requires no additional training yet shows great performance on five public datasets.
arXiv Detail & Related papers (2024-08-09T09:28:35Z) - Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion [7.8788463395442045]
We propose a method to segment and recover a static, clean background and multiple 360$circ$ objects from observations of scenes at different timestamps.
Our basic idea is that, by observing the same set of objects in various arrangement, so that parts that are invisible in one scene may become visible in others.
arXiv Detail & Related papers (2024-04-15T02:44:23Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Scene-level Tracking and Reconstruction without Object Priors [14.068026331380844]
We present the first real-time system capable of tracking and reconstructing, individually, every visible object in a given scene.
Our proposed system can provide the live geometry and deformation of all visible objects in a novel scene in real-time.
arXiv Detail & Related papers (2022-10-07T20:56:14Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Unsupervised Object Learning via Common Fate [61.14802390241075]
Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling.
We decompose this problem into three easier subtasks, and provide candidate solutions for each of them.
We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos.
arXiv Detail & Related papers (2021-10-13T08:22:04Z) - Learning Object-Compositional Neural Radiance Field for Editable Scene
Rendering [42.37007176376849]
We present a novel neural scene rendering system, which learns an object-compositional neural radiance field and produces realistic rendering for a clustered and real-world scene.
To survive the training in heavily cluttered scenes, we propose a scene-guided training strategy to solve the 3D space ambiguity in the occluded regions and learn sharp boundaries for each object.
arXiv Detail & Related papers (2021-09-04T11:37:18Z) - Object-Centric Image Generation with Factored Depths, Locations, and
Appearances [30.541425619507184]
We present a generative model of images that explicitly reasons over the set of objects they show.
Our model learns a structured latent representation that separates objects from each other and from the background.
It can be trained from images alone in a purely unsupervised fashion without the need for object masks or depth information.
arXiv Detail & Related papers (2020-04-01T18:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.