Recognizing Scenes from Novel Viewpoints
- URL: http://arxiv.org/abs/2112.01520v1
- Date: Thu, 2 Dec 2021 18:59:40 GMT
- Title: Recognizing Scenes from Novel Viewpoints
- Authors: Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh
Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari
- Abstract summary: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects.
We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
- Score: 99.90914180489456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents,
the ability to recognize a scene from any viewpoint given only a few images
enables them to efficiently interact with the scene and its objects. In this
work, we attempt to endow machines with this ability. We propose a model which
takes as input a few RGB images of a new scene and recognizes the scene from
novel viewpoints by segmenting it into semantic categories. All this without
access to the RGB images from those views. We pair 2D scene recognition with an
implicit 3D representation and learn from multi-view 2D annotations of hundreds
of scenes without any 3D supervision beyond camera poses. We experiment on
challenging datasets and demonstrate our model's ability to jointly capture
semantics and geometry of novel scenes with diverse layouts, object types and
shapes.
Related papers
- Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D [95.14469865815768]
2D vision models can be used for semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets.
However, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task.
In this paper, we propose Lift3D, which trains to predict unseen views on feature spaces generated by a few visual models.
We even outperform state-of-the-art methods specialized for the task in question.
arXiv Detail & Related papers (2024-03-27T18:13:16Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models [4.036372578802888]
We show that certain 3D scene representations are encoded in the text embedding space of models like Stable Diffusion.
We exploit the 3D scene representations for 3D vision tasks, namely, view-controlled text-to-image generation, and novel view synthesis from a single image.
arXiv Detail & Related papers (2023-09-14T18:52:16Z) - SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Curiosity-driven 3D Scene Structure from Single-image Self-supervision [22.527696847086574]
Previous work has demonstrated learning isolated 3D objects from 2D-only self-supervision.
Here we set out to extend this to entire 3D scenes made out of multiple objects, including their location, orientation and type.
The resulting system converts 2D images of different virtual or real images into complete 3D scenes, learned only from 2D images of those scenes.
arXiv Detail & Related papers (2020-12-02T14:17:16Z) - Indoor Scene Recognition in 3D [26.974703983293093]
Existing approaches attempt to classify the scene based on 2D images or 2.5D range images.
Here, we study scene recognition from 3D point cloud (or voxel) data.
We show that it greatly outperforms methods based on 2D birds-eye views.
arXiv Detail & Related papers (2020-02-28T15:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.