SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation
- URL: http://arxiv.org/abs/2509.03999v1
- Date: Thu, 04 Sep 2025 08:27:54 GMT
- Title: SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation
- Authors: Han Huang, Han Sun, Ningzhong Liu, Huiyu Zhou, Jiaquan Shen,
- Abstract summary: SliceSemOcc is a novel vertical slice based multimodal framework for 3D semantic occupancy representation.<n>We propose the SE3D module, which preserves height-wise resolution through average pooling and assigns dynamic channel attention weights to each height layer.
- Score: 26.38332949554491
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driven by autonomous driving's demands for precise 3D perception, 3D semantic occupancy prediction has become a pivotal research topic. Unlike bird's-eye-view (BEV) methods, which restrict scene representation to a 2D plane, occupancy prediction leverages a complete 3D voxel grid to model spatial structures in all dimensions, thereby capturing semantic variations along the vertical axis. However, most existing approaches overlook height-axis information when processing voxel features. And conventional SENet-style channel attention assigns uniform weight across all height layers, limiting their ability to emphasize features at different heights. To address these limitations, we propose SliceSemOcc, a novel vertical slice based multimodal framework for 3D semantic occupancy representation. Specifically, we extract voxel features along the height-axis using both global and local vertical slices. Then, a global local fusion module adaptively reconciles fine-grained spatial details with holistic contextual information. Furthermore, we propose the SEAttention3D module, which preserves height-wise resolution through average pooling and assigns dynamic channel attention weights to each height layer. Extensive experiments on nuScenes-SurroundOcc and nuScenes-OpenOccupancy datasets verify that our method significantly enhances mean IoU, achieving especially pronounced gains on most small-object categories. Detailed ablation studies further validate the effectiveness of the proposed SliceSemOcc framework.
Related papers
- Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion [52.959716866316604]
Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs.<n>Existing methods face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty.<n>We propose a textitMulti-Resolution Alignment (MRA) approach to mitigate voxel sparsity in camera-based 3D semantic scene completion.
arXiv Detail & Related papers (2026-02-03T10:46:51Z) - HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving [52.959716866316604]
Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving.<n>Existing SSC methods suffer from the inherent input-output dimension gap and annotation-reality density gap.<n>We propose a corresponding High- Dimension High-Density Semantic Scene Completion framework with expanded pixel semantics and refined voxel occupancies.
arXiv Detail & Related papers (2025-11-11T07:24:35Z) - DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction [17.38916914453357]
Predicting the 3D occupancy of large-scale outdoor scenes from 2D images is ill-posed and resource-intensive.<n>We present textbfDGOcc, a textbfGlobal query-based network for monocular 3D textbfOccupancy prediction.<n>The proposed method achieves the best performance on monocular semantic occupancy prediction while reducing GPU and time overhead.
arXiv Detail & Related papers (2025-04-10T07:44:55Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction.<n> Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN.
SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues.
Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z) - OccuSeg: Occupancy-aware 3D Instance Segmentation [39.71517989569514]
"3D occupancy size" is the number of voxels occupied by each instance.
"OccuSeg" is an occupancy-aware 3D instance segmentation scheme.
"State-of-the-art performance" on 3 real-world datasets.
arXiv Detail & Related papers (2020-03-14T02:48:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.