Related papers: RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction

RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction

URL: http://arxiv.org/abs/2307.11932v2
Date: Wed, 4 Oct 2023 22:57:04 GMT
Title: RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction
Authors: Isaac Kasahara, Shubham Agrawal, Selim Engin, Nikhil Chavan-Dafle, Shuran Song, Volkan Isler
Abstract summary: General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting.
Score: 43.63574200858472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large visual language models (Dalle-2) to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for robustness to changes in depth distributions and scale. With rigorous quantitative evaluation, we show that our method outperforms multiple baselines while providing generalization to novel objects and scenes.

Related papers

InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model [46.67494008720215]
InstaInpaint is a framework that produces 3D-scene inpainting from a 2D inpainting proposal within 0.4 seconds.<n>We analyze and identify several key designs that improve generalization, textural consistency, and geometric correctness.<n>InstaInpaint achieves a 1000x speed-up from prior methods while maintaining a state-of-the-art performance across two standard benchmarks.
arXiv Detail & Related papers (2025-06-12T17:59:55Z)
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration [18.23983135970619]
We propose a novel layered 3D scene reconstruction framework from panoramic image, named Scene4U. Specifically, Scene4U integrates an open-vocabulary segmentation model with a large language model to decompose a real panorama into multiple layers. We then employ a layered repair module based on diffusion model to restore occluded regions using visual cues and depth information, generating a hierarchical representation of the scene. Scene4U outperforms state-of-the-art method, improving by 24.24% in LPIPS and 24.40% in BRISQUE, while also achieving the fastest training speed.
arXiv Detail & Related papers (2025-04-01T03:17:24Z)
Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling [27.577720075952225]
We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each.
arXiv Detail & Related papers (2024-11-29T06:19:04Z)
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting [75.7154104065613]
We introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process. We also introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry.
arXiv Detail & Related papers (2024-04-30T17:59:40Z)
Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View [5.222115919729418]
Single-view 3D reconstruction is currently approached from two dominant perspectives. We propose a hybrid method following a divide-and-conquer strategy. We first process the scene holistically, extracting depth and semantic information. We then leverage a single-shot object-level method for the detailed reconstruction of individual components.
arXiv Detail & Related papers (2024-04-04T12:58:46Z)
Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion [15.444301186927142]
We present a novel indoor 3D reconstruction method with occluded surface completion, given a sequence of depth readings. Our method tackles the task of completing the occluded scene surfaces, resulting in a complete 3D scene mesh. We evaluate the proposed method on the 3D Completed Room Scene (3D-CRS) and iTHOR datasets.
arXiv Detail & Related papers (2024-04-03T21:18:27Z)
NeRFiller: Completing Scenes via Generative 3D Inpainting [113.18181179986172]
We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting. In contrast to related works, we focus on completing scenes rather than deleting foreground objects.
arXiv Detail & Related papers (2023-12-07T18:59:41Z)
O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model [28.372289119872764]
Occlusion is a common issue in 3D reconstruction from RGB-D videos, often blocking the complete reconstruction of objects. We propose a novel framework, empowered by a 2D diffusion-based in-painting model, to reconstruct complete surfaces for the hidden parts of objects.
arXiv Detail & Related papers (2023-08-18T14:38:31Z)
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields [26.296017756560467]
In 3D, solutions must be consistent across multiple views and geometrically valid. We propose a novel 3D inpainting method that addresses these challenges. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches.
arXiv Detail & Related papers (2022-11-22T13:14:50Z)
PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes [84.66946637534089]
PhotoScene is a framework that takes input image(s) of a scene and builds a photorealistic digital twin with high-quality materials and similar lighting. We model scene materials using procedural material graphs; such graphs represent photorealistic and resolution-independent materials. We evaluate our technique on objects and layout reconstructions from ScanNet, SUN RGB-D and stock photographs, and demonstrate that our method reconstructs high-quality, fully relightable 3D scenes.
arXiv Detail & Related papers (2022-07-02T06:52:44Z)
Recognizing Scenes from Novel Viewpoints [99.90914180489456]
Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
arXiv Detail & Related papers (2021-12-02T18:59:40Z)
Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images. A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image. We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.