Self-Supervised Scene De-occlusion
- URL: http://arxiv.org/abs/2004.02788v1
- Date: Mon, 6 Apr 2020 16:31:11 GMT
- Title: Self-Supervised Scene De-occlusion
- Authors: Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, Chen Change
Loy
- Abstract summary: This paper investigates the problem of scene de-occlusion, which aims to recover the underlying occlusion ordering and complete the invisible parts of occluded objects.
We make the first attempt to address the problem through a novel and unified framework that recovers hidden scene structures without ordering and amodal annotations as supervisions.
Based on PCNet-M and PCNet-C, we devise a novel inference scheme to accomplish scene de-occlusion, via progressive ordering recovery, amodal completion and content completion.
- Score: 186.89979151728636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural scene understanding is a challenging task, particularly when
encountering images of multiple objects that are partially occluded. This
obstacle is given rise by varying object ordering and positioning. Existing
scene understanding paradigms are able to parse only the visible parts,
resulting in incomplete and unstructured scene interpretation. In this paper,
we investigate the problem of scene de-occlusion, which aims to recover the
underlying occlusion ordering and complete the invisible parts of occluded
objects. We make the first attempt to address the problem through a novel and
unified framework that recovers hidden scene structures without ordering and
amodal annotations as supervisions. This is achieved via Partial Completion
Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of
object masks and contents, respectively, in a self-supervised manner. Based on
PCNet-M and PCNet-C, we devise a novel inference scheme to accomplish scene
de-occlusion, via progressive ordering recovery, amodal completion and content
completion. Extensive experiments on real-world scenes demonstrate the superior
performance of our approach to other alternatives. Remarkably, our approach
that is trained in a self-supervised manner achieves comparable results to
fully-supervised methods. The proposed scene de-occlusion framework benefits
many applications, including high-quality and controllable image manipulation
and scene recomposition (see Fig. 1), as well as the conversion of existing
modal mask annotations to amodal mask annotations.
Related papers
- Open-World Amodal Appearance Completion [14.398395372699207]
We introduce Open-World Amodal Appearance Completion, a training-free framework that expands amodal completion capabilities.
Our approach generalizes to arbitrary objects specified by both direct terms and abstract queries.
arXiv Detail & Related papers (2024-11-20T03:45:48Z) - Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders [93.87585467898252]
We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders.
MonoMAE consists of two novel designs. The first is depth-aware masking that selectively masks certain parts of non-occluded object queries.
The second is lightweight query completion that works with the depth-aware masking to learn to reconstruct and complete the masked object queries.
arXiv Detail & Related papers (2024-05-13T12:32:45Z) - Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with
Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points.
Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction.
By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z) - Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition [57.088328223220934]
Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world.
In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.
arXiv Detail & Related papers (2021-04-12T11:37:23Z) - Human De-occlusion: Invisible Perception and Recovery for Humans [26.404444296924243]
We tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans.
In particular, a two-stage framework is proposed to estimate the invisible portions and recover the content inside.
Our method performs over the state-of-the-art techniques in both tasks of mask completion and content recovery.
arXiv Detail & Related papers (2021-03-22T05:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.