OccTransformer: Improving BEVFormer for 3D camera-only occupancy
prediction
- URL: http://arxiv.org/abs/2402.18140v1
- Date: Wed, 28 Feb 2024 08:03:34 GMT
- Title: OccTransformer: Improving BEVFormer for 3D camera-only occupancy
prediction
- Authors: Jian Liu, Sipeng Zhang, Chuixin Kong, Wenyuan Zhang, Yuhang Wu, Yikang
Ding, Borun Xu, Ruibo Ming, Donglai Wei, Xianming Liu
- Abstract summary: "occTransformer" is used for the 3D occupancy prediction track in the autonomous driving challenge at CVPR 2023.
Our method builds upon the strong baseline BEVFormer and improves its performance through several simple yet effective techniques.
Using these methods, our solution achieved 49.23 miou on the 3D occupancy prediction track in the autonomous driving challenge.
- Score: 32.17406995216123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report presents our solution, "occTransformer" for the 3D
occupancy prediction track in the autonomous driving challenge at CVPR 2023.
Our method builds upon the strong baseline BEVFormer and improves its
performance through several simple yet effective techniques. Firstly, we
employed data augmentation to increase the diversity of the training data and
improve the model's generalization ability. Secondly, we used a strong image
backbone to extract more informative features from the input data. Thirdly, we
incorporated a 3D unet head to better capture the spatial information of the
scene. Fourthly, we added more loss functions to better optimize the model.
Additionally, we used an ensemble approach with the occ model BevDet and
SurroundOcc to further improve the performance. Most importantly, we integrated
3D detection model StreamPETR to enhance the model's ability to detect objects
in the scene. Using these methods, our solution achieved 49.23 miou on the 3D
occupancy prediction track in the autonomous driving challenge.
Related papers
- WidthFormer: Toward Efficient Transformer-based BEV View Transformation [21.10523575080856]
WidthFormer is a transformer-based module to compute Bird's-Eye-View (BEV) representations from multi-view cameras for real-time autonomous-driving applications.
We first introduce a novel 3D positional encoding mechanism capable of accurately encapsulating 3D geometric information.
We then develop two modules to compensate for potential information loss due to feature compression.
arXiv Detail & Related papers (2024-01-08T11:50:23Z) - An Efficient Wide-Range Pseudo-3D Vehicle Detection Using A Single
Camera [10.573423265001706]
This paper proposes a novel wide-range Pseudo-3D Vehicle Detection method based on images from a single camera.
To detect pseudo-3D objects, our model adopts specifically designed detection heads.
Joint constraint loss combining both the object box and SPL is designed during model training, improving the efficiency, stability, and prediction accuracy of the model.
arXiv Detail & Related papers (2023-09-15T12:50:09Z) - FB-OCC: 3D Occupancy Prediction based on Forward-Backward View
Transformation [79.41536932037822]
Proposal builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection.
Designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track.
arXiv Detail & Related papers (2023-07-04T05:55:54Z) - Collaboration Helps Camera Overtake LiDAR in 3D Detection [49.58433319402405]
Camera-only 3D detection provides a simple solution for localizing objects in 3D space compared to LiDAR-based detection systems.
Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication.
Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70.
arXiv Detail & Related papers (2023-03-23T03:50:41Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - T3VIP: Transformation-based 3D Video Prediction [49.178585201673364]
We propose a 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts.
Our model is fully unsupervised, captures the nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals.
To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera.
arXiv Detail & Related papers (2022-09-19T15:01:09Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Improving 3D Object Detection through Progressive Population Based
Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection.
We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations.
We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.