3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
- URL: http://arxiv.org/abs/2503.15185v1
- Date: Wed, 19 Mar 2025 13:14:57 GMT
- Title: 3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
- Authors: Gyeongrok Oh, Sungjune Kim, Heeju Ko, Hyung-gun Chi, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sungjoon Choi, Sujin Jang, Sangpil Kim,
- Abstract summary: We introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context.<n>In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions.<n>We show that ProtoOcc achieves competitive performance against the baselines even with 75% reduced voxel resolution.
- Score: 16.69186493462387
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query resolutions, which inevitably leads to an information loss. Therefore, it is essential to encode and preserve rich visual details within limited query sizes while ensuring a comprehensive representation of 3D occupancy. To this end, we introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context. In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions. Additionally, we design a multi-perspective decoding strategy to efficiently disentangle the densely compressed visual cues into a high-dimensional 3D occupancy scene. Experimental results on both Occ3D and SemanticKITTI benchmarks demonstrate the effectiveness of the proposed method, showing clear improvements over the baselines. More importantly, ProtoOcc achieves competitive performance against the baselines even with 75\% reduced voxel resolution.
Related papers
- AdaOcc: Adaptive-Resolution Occupancy Prediction [20.0994984349065]
We introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach.
Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework.
In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance.
arXiv Detail & Related papers (2024-08-24T03:46:25Z) - HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction [37.00102816748563]
We introduce a volume encoding to explicitly encode the spatial information.
High-resolution volumes capture the high-frequency geometry details.
Low-resolution volumes enforce the spatial consistency to keep the shape smooth.
This hierarchical volume encoding could be appended to any implicit surface reconstruction method as a plug-and-play module.
arXiv Detail & Related papers (2024-08-03T06:34:20Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D
Bird's-Eye View and Perspective View [46.81548000021799]
In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes.
Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design.
A new method, dubbed FastOcc, is proposed to accelerate the model while keeping its accuracy.
Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves a fast inference speed.
arXiv Detail & Related papers (2024-03-05T07:01:53Z) - COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction [60.87168562615171]
The autonomous driving community has shown significant interest in 3D occupancy prediction.
We propose Compact Occupancy TRansformer (COTR) with a geometry-aware occupancy encoder and a semantic-aware group decoder.
COTR outperforms baselines with a relative improvement of 8%-15%.
arXiv Detail & Related papers (2023-12-04T14:23:18Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design.
CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation.
It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.