Multi-Plane Program Induction with 3D Box Priors
- URL: http://arxiv.org/abs/2011.10007v2
- Date: Sun, 22 Nov 2020 19:13:03 GMT
- Title: Multi-Plane Program Induction with 3D Box Priors
- Authors: Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B.
Tenenbaum, Noah Snavely, Jiajun Wu
- Abstract summary: We present Box Program Induction (BPI), which infers a program-like scene representation from a single image.
BPI simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes, and camera parameters.
It uses neural networks to infer visual cues such as vanishing points, wireframe lines to guide a search-based algorithm to find the program that best explains the image.
- Score: 110.6726150681556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider two important aspects in understanding and editing images:
modeling regular, program-like texture or patterns in 2D planes, and 3D posing
of these planes in the scene. Unlike prior work on image-based program
synthesis, which assumes the image contains a single visible 2D plane, we
present Box Program Induction (BPI), which infers a program-like scene
representation that simultaneously models repeated structure on multiple 2D
planes, the 3D position and orientation of the planes, and camera parameters,
all from a single image. Our model assumes a box prior, i.e., that the image
captures either an inner view or an outer view of a box in 3D. It uses neural
networks to infer visual cues such as vanishing points, wireframe lines to
guide a search-based algorithm to find the program that best explains the
image. Such a holistic, structured scene representation enables 3D-aware
interactive image editing operations such as inpainting missing pixels,
changing camera parameters, and extrapolate the image contents.
Related papers
- 3D Congealing: 3D-Aware Image Alignment in the Wild [44.254247801001675]
3D Congealing is a problem of 3D-aware alignment for 2D images capturing semantically similar objects.
We introduce a general framework that tackles the task without assuming shape templates, poses, or any camera parameters.
Our framework can be used for various tasks such as correspondence matching, pose estimation, and image editing.
arXiv Detail & Related papers (2024-04-02T17:32:12Z) - RoSI: Recovering 3D Shape Interiors from Few Articulation Images [20.430308190444737]
We present a learning framework to recover the shape interiors of existing 3D models with only their exteriors from multi-view and multi-articulation images.
Our neural architecture is trained in a category-agnostic manner and it consists of a motion-aware multi-view analysis phase.
In addition, our method also predicts part articulations and is able to realize and even extrapolate the captured motions on the target 3D object.
arXiv Detail & Related papers (2023-04-13T08:45:26Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - ONeRF: Unsupervised 3D Object Segmentation from Multiple Views [59.445957699136564]
ONeRF is a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.
The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering.
arXiv Detail & Related papers (2022-11-22T06:19:37Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Learning Ego 3D Representation as Ray Tracing [42.400505280851114]
We present a novel end-to-end architecture for ego 3D representation learning from unconstrained camera views.
Inspired by the ray tracing principle, we design a polarized grid of "imaginary eyes" as the learnable ego 3D representation.
We show that our model outperforms all state-of-the-art alternatives significantly.
arXiv Detail & Related papers (2022-06-08T17:55:50Z) - Towards Panoptic 3D Parsing for Single Image in the Wild [35.98539308998578]
This paper presents an integrated system that performs holistic image segmentation, object detection, instance segmentation, depth estimation, and object instance 3D reconstruction for indoor and outdoor scenes from a single RGB image.
Our proposed panoptic 3D parsing framework points to a promising direction in computer vision.
It can be applied to various applications, including autonomous driving, mapping, robotics, design, computer graphics, robotics, human-computer interaction, and augmented reality.
arXiv Detail & Related papers (2021-11-04T17:45:04Z) - Bidirectional Projection Network for Cross Dimension Scene Understanding [69.29443390126805]
We present a emphbidirectional projection network (BPNet) for joint 2D and 3D reasoning in an end-to-end manner.
Via the emphBPM, complementary 2D and 3D information can interact with each other in multiple architectural levels.
Our emphBPNet achieves top performance on the ScanNetV2 benchmark for both 2D and 3D semantic segmentation.
arXiv Detail & Related papers (2021-03-26T08:31:39Z) - GRF: Learning a General Radiance Field for 3D Representation and
Rendering [4.709764624933227]
We present a simple yet powerful neural network that implicitly represents and renders 3D objects and scenes only from 2D observations.
The network models 3D geometries as a general radiance field, which takes a set of 2D images with camera poses and intrinsics as input.
Our method can generate high-quality and realistic novel views for novel objects, unseen categories and challenging real-world scenes.
arXiv Detail & Related papers (2020-10-09T14:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.