Perspective Plane Program Induction from a Single Image
- URL: http://arxiv.org/abs/2006.14708v1
- Date: Thu, 25 Jun 2020 21:18:58 GMT
- Title: Perspective Plane Program Induction from a Single Image
- Authors: Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B.
Tenenbaum, Jiajun Wu
- Abstract summary: We study the inverse graphics problem of inferring a holistic representation for natural images.
We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image.
Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem.
- Score: 85.28956922100305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the inverse graphics problem of inferring a holistic representation
for natural images. Given an input image, our goal is to induce a
neuro-symbolic, program-like representation that jointly models camera poses,
object locations, and global scene structures. Such high-level, holistic scene
representations further facilitate low-level image manipulation tasks such as
inpainting. We formulate this problem as jointly finding the camera pose and
scene structure that best describe the input image. The benefits of such joint
inference are two-fold: scene regularity serves as a new cue for perspective
correction, and in turn, correct perspective correction leads to a simplified
scene structure, similar to how the correct shape leads to the most regular
texture in shape from texture. Our proposed framework, Perspective Plane
Program Induction (P3I), combines search-based and gradient-based algorithms to
efficiently solve the problem. P3I outperforms a set of baselines on a
collection of Internet images, across tasks including camera pose estimation,
global structure inference, and down-stream image manipulation tasks.
Related papers
- Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture [47.44029968307207]
We propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images.
Our approach utilizes the proposed Single-view neural implicit Shape and Radiance field (SSR) representations to leverage both explicit 3D shape supervision and volume rendering.
A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model.
arXiv Detail & Related papers (2023-11-01T11:46:15Z) - PanoContext-Former: Panoramic Total Scene Understanding with a
Transformer [37.51637352106841]
Panoramic image enables deeper understanding and more holistic perception of $360circ$ surrounding environment.
In this paper, we propose a novel method using depth prior for holistic indoor scene understanding.
In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes.
arXiv Detail & Related papers (2023-05-21T16:20:57Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene
Context Graph and Relation-based Optimization [66.25948693095604]
We propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image.
Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement.
arXiv Detail & Related papers (2021-08-24T13:55:29Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - 3D Scene Geometry-Aware Constraint for Camera Localization with Deep
Learning [11.599633757222406]
Recently end-to-end approaches based on convolutional neural network have been much studied to achieve or even exceed 3D-geometry based traditional methods.
In this work, we propose a compact network for absolute camera pose regression.
Inspired from those traditional methods, a 3D scene geometry-aware constraint is also introduced by exploiting all available information including motion, depth and image contents.
arXiv Detail & Related papers (2020-05-13T04:15:14Z) - Learning Pose-invariant 3D Object Reconstruction from Single-view Images [61.98279201609436]
In this paper, we explore a more realistic setup of learning 3D shape from only single-view images.
The major difficulty lies in insufficient constraints that can be provided by single view images.
We propose an effective adversarial domain confusion method to learn pose-disentangled compact shape space.
arXiv Detail & Related papers (2020-04-03T02:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.