Related papers: BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout

BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout

URL: http://arxiv.org/abs/2308.01661v4
Date: Sat, 23 Sep 2023 06:59:18 GMT
Title: BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout
Authors: Kairui Yang, Enhui Ma, Jibin Peng, Qing Guo, Di Lin, Kaicheng Yu
Abstract summary: We propose a two-stage generative method, dubbed BEVControl, that can generate accurate foreground and background contents. Our experiments show that our BEVControl surpasses the state-of-the-art method, BEVGen, by a significant margin.
Score: 17.389444754562252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using synthesized images to boost the performance of perception models is a long-standing research challenge in computer vision. It becomes more eminent in visual-centric autonomous driving systems with multi-view cameras as some long-tail scenarios can never be collected. Guided by the BEV segmentation layouts, the existing generative networks seem to synthesize photo-realistic street-view images when evaluated solely on scene-level metrics. However, once zoom-in, they usually fail to produce accurate foreground and background details such as heading. To this end, we propose a two-stage generative method, dubbed BEVControl, that can generate accurate foreground and background contents. In contrast to segmentation-like input, it also supports sketch style input, which is more flexible for humans to edit. In addition, we propose a comprehensive multi-level evaluation protocol to fairly compare the quality of the generated scene, foreground object, and background geometry. Our extensive experiments show that our BEVControl surpasses the state-of-the-art method, BEVGen, by a significant margin, from 5.89 to 26.80 on foreground segmentation mIoU. In addition, we show that using images generated by BEVControl to train the downstream perception model, it achieves on average 1.29 improvement in NDS score.

Related papers

HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection [34.72603963887331]
HV-BEV is a novel approach that decouples feature sampling in the BEV grid queries paradigm into horizontal feature aggregation and vertical adaptive height-aware reference point sampling. Our best-performing model achieves a remarkable 50.5% mAP and 59.8% NDS on the nuScenes testing set.
arXiv Detail & Related papers (2024-12-25T11:49:14Z)
OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping [25.801868221496473]
OneBEV is a novel BEV semantic mapping approach using merely a single panoramic image as input. A distortion-aware module termed Mamba View Transformation (MVT) is specifically designed to handle the spatial distortions in panoramas. This work advances BEV semantic mapping in autonomous driving, paving the way for more advanced and reliable autonomous systems.
arXiv Detail & Related papers (2024-09-20T21:33:53Z)
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability [17.995042743704442]
MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design. Our method is capable of generating high-resolution photorealistic images from text descriptions with thousands of training samples.
arXiv Detail & Related papers (2024-07-28T11:39:40Z)
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow [20.550935390111686]
We introduce CLIP-BEVFormer, a novel approach to enhance the multi-view image-derived BEV backbones with ground truth information flow. We conduct extensive experiments on the challenging nuScenes dataset and showcase significant and consistent improvements over the SOTA.
arXiv Detail & Related papers (2024-03-13T19:21:03Z)
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z)
FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation. We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z)
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration [20.733451121484993]
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene. We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts.
arXiv Detail & Related papers (2022-12-19T08:31:08Z)
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision [101.36648828734646]
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
arXiv Detail & Related papers (2022-11-18T18:59:48Z)
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z)
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z)
BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs [3.5728676902207988]
We present an effective transformer-based method for BEV semantic segmentation from arbitrary camera rigs. Specifically, our method first encodes image features from arbitrary cameras with a shared backbone. An efficient multi-camera deformable attention unit is designed to carry out the BEV-to-image view transformation.
arXiv Detail & Related papers (2022-03-08T12:39:51Z)
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.