Related papers: Object-level Scene Deocclusion

Object-level Scene Deocclusion

URL: http://arxiv.org/abs/2406.07706v1
Date: Tue, 11 Jun 2024 20:34:10 GMT
Title: Object-level Scene Deocclusion
Authors: Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu,
Abstract summary: We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning. Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
Score: 92.39886029550286
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which produces a full-view feature map that simultaneously encodes multiple complete objects, and the visible-to-complete latent generator, which learns to implicitly predict the full-view feature map from partial-view feature map and text prompts extracted from the incomplete objects in the input image. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning, avoiding tedious annotations of the amodal masks and occluded regions. At inference, we devise a layer-wise deocclusion strategy to improve efficiency while maintaining the deocclusion quality. Extensive experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin. Our method can also be extended to cross-domain scenes and novel categories that are not covered by the training set. Further, we demonstrate the deocclusion applicability of PACO in single-view 3D scene reconstruction and object recomposition.

Related papers

IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion [15.837932667195037]
IGFuse is a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans.<n>Our method constructs segmentation aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans.<n>IGFuse enables high fidelity rendering and object level scene manipulation without dense observations or complex pipelines.
arXiv Detail & Related papers (2025-08-18T17:59:47Z)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation. We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z)
sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views [41.73382885439258]
Reconstructing outdoor scenes from outward-facing views poses significant challenges due to minimal view overlap. We propose a fast, single-shot pipeline for unbounded-view 3D scene reconstruction via hierarchal extrapolation. We find that latentELF faithfully reconstructs occluded regions, supports real-time rendering, and provides rich features for downstream applications.
arXiv Detail & Related papers (2025-02-06T18:58:45Z)
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI.<n>We introduce Articulate3D, an expertly curated 3D dataset featuring high-quality manual annotations on 280 indoor scenes.<n>We also present USDNet, a novel unified framework capable of simultaneously predicting part segmentation along with a full specification of motion attributes for articulated objects.
arXiv Detail & Related papers (2024-12-02T11:33:55Z)
ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects [44.38881095466177]
Implicit neural fields have made remarkable progress in reconstructing 3D surfaces from multiple images. Previous work has attempted to tackle this problem by introducing a framework to train separate signed distance fields. We introduce our method, ObjectCarver, to tackle the problem of object separation from just click input in a single view.
arXiv Detail & Related papers (2024-07-26T22:13:20Z)
Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning [61.14132533712537]
We propose MAL-SPC, a framework that effectively leverages both object-level and category-specific geometric similarities to complete missing structures. Our MAL-SPC does not require any 3D complete supervision and only necessitates a single partial point cloud for each object.
arXiv Detail & Related papers (2024-07-13T06:53:39Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
Sequential Amodal Segmentation via Cumulative Occlusion Learning [15.729212571002906]
A visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. We introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes.
arXiv Detail & Related papers (2024-05-09T14:17:26Z)
Unsupervised Object Learning via Common Fate [61.14802390241075]
Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos.
arXiv Detail & Related papers (2021-10-13T08:22:04Z)
Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition [57.088328223220934]
Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.
arXiv Detail & Related papers (2021-04-12T11:37:23Z)
Self-Supervised Scene De-occlusion [186.89979151728636]
This paper investigates the problem of scene de-occlusion, which aims to recover the underlying occlusion ordering and complete the invisible parts of occluded objects. We make the first attempt to address the problem through a novel and unified framework that recovers hidden scene structures without ordering and amodal annotations as supervisions. Based on PCNet-M and PCNet-C, we devise a novel inference scheme to accomplish scene de-occlusion, via progressive ordering recovery, amodal completion and content completion.
arXiv Detail & Related papers (2020-04-06T16:31:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.