Indoor Scene Recognition in 3D
- URL: http://arxiv.org/abs/2002.12819v2
- Date: Thu, 2 Jul 2020 21:25:18 GMT
- Title: Indoor Scene Recognition in 3D
- Authors: Shengyu Huang, Mikhail Usvyatsov and Konrad Schindler
- Abstract summary: Existing approaches attempt to classify the scene based on 2D images or 2.5D range images.
Here, we study scene recognition from 3D point cloud (or voxel) data.
We show that it greatly outperforms methods based on 2D birds-eye views.
- Score: 26.974703983293093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognising in what type of environment one is located is an important
perception task. For instance, for a robot operating in indoors it is helpful
to be aware whether it is in a kitchen, a hallway or a bedroom. Existing
approaches attempt to classify the scene based on 2D images or 2.5D range
images. Here, we study scene recognition from 3D point cloud (or voxel) data,
and show that it greatly outperforms methods based on 2D birds-eye views.
Moreover, we advocate multi-task learning as a way of improving scene
recognition, building on the fact that the scene type is highly correlated with
the objects in the scene, and therefore with its semantic segmentation into
different object classes. In a series of ablation studies, we show that
successful scene recognition is not just the recognition of individual objects
unique to some scene type (such as a bathtub), but depends on several different
cues, including coarse 3D geometry, colour, and the (implicit) distribution of
object categories. Moreover, we demonstrate that surprisingly sparse 3D data is
sufficient to classify indoor scenes with good accuracy.
Related papers
- 3D Feature Distillation with Object-Centric Priors [9.626027459292926]
2D vision-language models such as CLIP have been widely popularized, due to their impressive capabilities for open-vocabulary grounding in 2D images.
Recent works aim to elevate 2D CLIP features to 3D via feature distillation, but either learn neural fields that are scene-specific or focus on indoor room scan data.
We show that our method reconstructs 3D CLIP features with improved grounding capacity and spatial consistency.
arXiv Detail & Related papers (2024-06-26T20:16:49Z) - Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level.
Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z) - SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z) - OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space.
This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - HyperDet3D: Learning a Scene-conditioned 3D Object Detector [154.84798451437032]
We propose HyperDet3D to explore scene-conditioned prior knowledge for 3D object detection.
Our HyperDet3D achieves state-of-the-art results on the 3D object detection benchmark of the ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2022-04-12T07:57:58Z) - Recognizing Scenes from Novel Viewpoints [99.90914180489456]
Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects.
We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
arXiv Detail & Related papers (2021-12-02T18:59:40Z) - Indoor Scene Generation from a Collection of Semantic-Segmented Depth
Images [18.24156991697044]
We present a method for creating 3D indoor scenes with a generative model learned from semantic-segmented depth images.
Given a room with a specified size, our method automatically generates 3D objects in a room from a randomly sampled latent code.
Compared to existing methods, our method not only efficiently reduces the workload of modeling and acquiring 3D scenes for training, but also produces better object shapes.
arXiv Detail & Related papers (2021-08-20T06:22:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.