BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and
Semantic Point Cloud
- URL: http://arxiv.org/abs/2006.11436v2
- Date: Tue, 23 Jun 2020 16:45:07 GMT
- Title: BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and
Semantic Point Cloud
- Authors: Mong H. Ng, Kaahan Radia, Jianfei Chen, Dequan Wang, Ionel Gog, and
Joseph E. Gonzalez
- Abstract summary: We focus on bird's eye semantic segmentation, a task that predicts pixel-wise semantic segmentation in BEV from side RGB images.
There are two main challenges to this task: the view transformation from side view to bird's eye view, as well as transfer learning to unseen domains.
Our novel 2-staged perception pipeline explicitly predicts pixel depths and combines them with pixel semantics in an efficient manner.
- Score: 21.29622194272066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bird's-eye-view (BEV) is a powerful and widely adopted representation for
road scenes that captures surrounding objects and their spatial locations,
along with overall context in the scene. In this work, we focus on bird's eye
semantic segmentation, a task that predicts pixel-wise semantic segmentation in
BEV from side RGB images. This task is made possible by simulators such as
Carla, which allow for cheap data collection, arbitrary camera placements, and
supervision in ways otherwise not possible in the real world. There are two
main challenges to this task: the view transformation from side view to bird's
eye view, as well as transfer learning to unseen domains. Existing work
transforms between views through fully connected layers and transfer learns via
GANs. This suffers from a lack of depth reasoning and performance degradation
across domains. Our novel 2-staged perception pipeline explicitly predicts
pixel depths and combines them with pixel semantics in an efficient manner,
allowing the model to leverage depth information to infer objects' spatial
locations in the BEV. In addition, we transfer learning by abstracting
high-level geometric features and predicting an intermediate representation
that is common across different domains. We publish a new dataset called
BEVSEG-Carla and show that our approach improves state-of-the-art by 24% mIoU
and performs well when transferred to a new domain.
Related papers
- Semi-Supervised Learning for Visual Bird's Eye View Semantic
Segmentation [16.3996408206659]
We present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training.
A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature.
Experiments on the nuScenes and Argoverse datasets show that our framework can effectively improve prediction accuracy.
arXiv Detail & Related papers (2023-08-28T12:23:36Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction.
Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information.
We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation [53.428312630479816]
We observe that the Field of View (FoV) gap induces noticeable instance appearance differences between the source and target domains.
Motivated by the observations, we propose the textbfPosition-Invariant Transform (PIT) to better align images in different domains.
arXiv Detail & Related papers (2021-08-16T15:16:47Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Combining Semantic Guidance and Deep Reinforcement Learning For
Generating Human Level Paintings [22.889059874754242]
Generation of stroke-based non-photorealistic imagery is an important problem in the computer vision community.
Previous methods have been limited to datasets with little variation in position, scale and saliency of the foreground object.
We propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time.
arXiv Detail & Related papers (2020-11-25T09:00:04Z) - A Sim2Real Deep Learning Approach for the Transformation of Images from
Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's
Eye View [0.0]
Distances can be more easily estimated when the camera perspective is transformed to a bird's eye view (BEV)
This paper describes a methodology to obtain a corrected 360deg BEV image given images from multiple vehicle-mounted cameras.
The neural network approach does not rely on manually labeled data, but is trained on a synthetic dataset in such a way that it generalizes well to real-world data.
arXiv Detail & Related papers (2020-05-08T14:54:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.