Related papers: 360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View

360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View

URL: http://arxiv.org/abs/2303.11910v4
Date: Mon, 4 Sep 2023 18:17:27 GMT
Title: 360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View
Authors: Zhifeng Teng, Jiaming Zhang, Kailun Yang, Kunyu Peng, Hao Shi, Simon Rei{\ss}, Ke Cao, Rainer Stiefelhagen
Abstract summary: Bird's-eye-view (BEV) perception is restricted when using a narrow Field of View (FoV) alone. 360BEV task is established for the first time to achieve holistic representations of indoor scenes in a top-down view.
Score: 38.10346176323481
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360{\deg} panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in a top-down view. Instead of relying on narrow-FoV image sequences, a panoramic image with depth information is sufficient to generate a holistic BEV semantic map. To benchmark 360BEV, we present two indoor datasets, 360BEV-Matterport and 360BEV-Stanford, both of which include egocentric panoramic images and semantic segmentation labels, as well as allocentric semantic maps. Besides delving deep into different mapping paradigms, we propose a dedicated solution for panoramic semantic mapping, namely 360Mapper. Through extensive experiments, our methods achieve 44.32% and 45.78% in mIoU on both datasets respectively, surpassing previous counterparts with gains of +7.60% and +9.70% in mIoU. Code and datasets are available at the project page: https://jamycheung.github.io/360BEV.html.

Related papers

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection. RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car. We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z)
OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping [25.801868221496473]
OneBEV is a novel BEV semantic mapping approach using merely a single panoramic image as input. A distortion-aware module termed Mamba View Transformation (MVT) is specifically designed to handle the spatial distortions in panoramas. This work advances BEV semantic mapping in autonomous driving, paving the way for more advanced and reliable autonomous systems.
arXiv Detail & Related papers (2024-09-20T21:33:53Z)
LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping [23.366388601110913]
We propose the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation.
arXiv Detail & Related papers (2024-05-29T08:03:36Z)
See360: Novel Panoramic View Interpolation [24.965259708297932]
See360 is a versatile and efficient framework for 360 panoramic view using latent space viewpoint estimation. We show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for four datasets.
arXiv Detail & Related papers (2024-01-07T09:17:32Z)
Bird's-Eye-View Scene Graph for Vision-Language Navigation [85.72725920024578]
Vision-language navigation (VLN) entails an agent to navigate 3D environments following human instructions. We present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment. Based on BSG, the agent predicts a local BEV grid-level decision score and a global graph-level decision score, combined with a sub-view selection score on panoramic views.
arXiv Detail & Related papers (2023-08-09T07:48:20Z)
SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection [46.92706423094971]
We propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features. We also propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature. Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-21T10:28:19Z)
Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps. Our architecture follows the top-down paradigm and incorporates a novel dense transformer module. We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z)
Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera [110.83289076967895]
We study scene understanding in the form of online estimation of semantic bird's-eye-view HD-maps using the video input from a single onboard camera. In our experiments, we demonstrate that the considered aspects are complementary to each other for HD-map understanding.
arXiv Detail & Related papers (2020-12-05T14:39:14Z)
Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images. We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.