FlashOcc: Fast and Memory-Efficient Occupancy Prediction via
Channel-to-Height Plugin
- URL: http://arxiv.org/abs/2311.12058v1
- Date: Sat, 18 Nov 2023 15:28:09 GMT
- Title: FlashOcc: Fast and Memory-Efficient Occupancy Prediction via
Channel-to-Height Plugin
- Authors: Zichen Yu, Changyong Shu, Jiajun Deng, Kangjie Lu, Zongdai Liu,
Jiangyong Yu, Dawei Yang, Hui Li, Yan Chen
- Abstract summary: FlashOCC consolidates rapid and memory-efficient occupancy prediction.
Channel-to-height transformation is introduced to lift the output logits from the BEV into the 3D space.
Results substantiate the superiority of our plug-and-play paradigm over previous state-of-the-art methods.
- Score: 32.172269679513285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the capability of mitigating the long-tail deficiencies and
intricate-shaped absence prevalent in 3D object detection, occupancy prediction
has become a pivotal component in autonomous driving systems. However, the
procession of three-dimensional voxel-level representations inevitably
introduces large overhead in both memory and computation, obstructing the
deployment of to-date occupancy prediction approaches. In contrast to the trend
of making the model larger and more complicated, we argue that a desirable
framework should be deployment-friendly to diverse chips while maintaining high
precision. To this end, we propose a plug-and-play paradigm, namely FlashOCC,
to consolidate rapid and memory-efficient occupancy prediction while
maintaining high precision. Particularly, our FlashOCC makes two improvements
based on the contemporary voxel-level occupancy prediction approaches. Firstly,
the features are kept in the BEV, enabling the employment of efficient 2D
convolutional layers for feature extraction. Secondly, a channel-to-height
transformation is introduced to lift the output logits from the BEV into the 3D
space. We apply the FlashOCC to diverse occupancy prediction baselines on the
challenging Occ3D-nuScenes benchmarks and conduct extensive experiments to
validate the effectiveness. The results substantiate the superiority of our
plug-and-play paradigm over previous state-of-the-art methods in terms of
precision, runtime efficiency, and memory costs, demonstrating its potential
for deployment. The code will be made available.
Related papers
- Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting [18.933451243989452]
Existing 3D occupancy forecasting approaches struggle to predict plausible spatial details for movable objects.
We propose a novel vision-based paradigm to explicitly tackle the bias and achieve both effective 3D OCF.
We develop an efficient multi-head network network EfficientOCF to achieve 3D OCF with our devisedtemporally decoupled representation.
arXiv Detail & Related papers (2024-11-21T14:27:15Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction [5.285847977231642]
3D semantic occupancy prediction is crucial for ensuring the safety in autonomous driving.
Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features.
We propose OccLoff, a framework that Learns to optimize Feature Fusion for 3D occupancy prediction.
arXiv Detail & Related papers (2024-11-06T06:34:27Z) - PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion [80.79938369319152]
We design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF)
Our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the Messy Room dataset (4.4% improvement of scene-level PQ)
arXiv Detail & Related papers (2024-10-14T16:06:59Z) - UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height [2.975860548186652]
Occupancy and 3D object detection are two standard tasks in modern autonomous driving system.
We propose a method to achieve fast 3D object detection and occupancy prediction (UltimateDO)
arXiv Detail & Related papers (2024-09-17T13:14:13Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction [56.72301849123049]
We present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ dataset challenge at CVPR 2024.
Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling.
Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth.
arXiv Detail & Related papers (2024-07-01T16:32:15Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.