GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation
- URL: http://arxiv.org/abs/2204.07733v1
- Date: Sat, 16 Apr 2022 06:46:45 GMT
- Title: GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation
- Authors: Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou,
Xiang Bai
- Abstract summary: Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
- Score: 105.19949897812494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving
for its powerful spatial representation ability. It is challenging to estimate
the BEV semantic maps from monocular images due to the spatial gap, since it is
implicitly required to realize both the perspective-to-BEV transformation and
segmentation. We present a novel two-stage Geometry Prior-based Transformation
framework named GitNet, consisting of (i) the geometry-guided pre-alignment and
(ii) ray-based transformer. In the first stage, we decouple the BEV
segmentation into the perspective image segmentation and geometric prior-based
mapping, with explicit supervision by projecting the BEV semantic labels onto
the image plane to learn visibility-aware features and learnable geometry to
translate into BEV space. Second, the pre-aligned coarse BEV features are
further deformed by ray-based transformers to take visibility knowledge into
account. GitNet achieves the leading performance on the challenging nuScenes
and Argoverse Datasets. The code will be publicly available.
Related papers
- Improving Bird's Eye View Semantic Segmentation by Task Decomposition [42.57351039508863]
We decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment.
Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively.
arXiv Detail & Related papers (2024-04-02T13:19:45Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - Semi-Supervised Learning for Visual Bird's Eye View Semantic
Segmentation [16.3996408206659]
We present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training.
A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature.
Experiments on the nuScenes and Argoverse datasets show that our framework can effectively improve prediction accuracy.
arXiv Detail & Related papers (2023-08-28T12:23:36Z) - Bird's-Eye-View Scene Graph for Vision-Language Navigation [85.72725920024578]
Vision-language navigation (VLN) entails an agent to navigate 3D environments following human instructions.
We present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment.
Based on BSG, the agent predicts a local BEV grid-level decision score and a global graph-level decision score, combined with a sub-view selection score on panoramic views.
arXiv Detail & Related papers (2023-08-09T07:48:20Z) - FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation.
We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - ViT-BEVSeg: A Hierarchical Transformer Network for Monocular
Birds-Eye-View Segmentation [2.70519393940262]
We evaluate the use of vision transformers (ViT) as a backbone architecture to generate Bird Eye View (BEV) maps.
Our network architecture, ViT-BEVSeg, employs standard vision transformers to generate a multi-scale representation of the input image.
We evaluate our approach on the nuScenes dataset demonstrating a considerable improvement relative to state-of-the-art approaches.
arXiv Detail & Related papers (2022-05-31T10:18:36Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z) - BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and
Semantic Point Cloud [21.29622194272066]
We focus on bird's eye semantic segmentation, a task that predicts pixel-wise semantic segmentation in BEV from side RGB images.
There are two main challenges to this task: the view transformation from side view to bird's eye view, as well as transfer learning to unseen domains.
Our novel 2-staged perception pipeline explicitly predicts pixel depths and combines them with pixel semantics in an efficient manner.
arXiv Detail & Related papers (2020-06-19T23:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.