Related papers: SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation

SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation

URL: http://arxiv.org/abs/2509.03999v1
Date: Thu, 04 Sep 2025 08:27:54 GMT
Title: SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation
Authors: Han Huang, Han Sun, Ningzhong Liu, Huiyu Zhou, Jiaquan Shen,
Abstract summary: SliceSemOcc is a novel vertical slice based multimodal framework for 3D semantic occupancy representation.<n>We propose the SE3D module, which preserves height-wise resolution through average pooling and assigns dynamic channel attention weights to each height layer.
Score: 26.38332949554491
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Driven by autonomous driving's demands for precise 3D perception, 3D semantic occupancy prediction has become a pivotal research topic. Unlike bird's-eye-view (BEV) methods, which restrict scene representation to a 2D plane, occupancy prediction leverages a complete 3D voxel grid to model spatial structures in all dimensions, thereby capturing semantic variations along the vertical axis. However, most existing approaches overlook height-axis information when processing voxel features. And conventional SENet-style channel attention assigns uniform weight across all height layers, limiting their ability to emphasize features at different heights. To address these limitations, we propose SliceSemOcc, a novel vertical slice based multimodal framework for 3D semantic occupancy representation. Specifically, we extract voxel features along the height-axis using both global and local vertical slices. Then, a global local fusion module adaptively reconciles fine-grained spatial details with holistic contextual information. Furthermore, we propose the SEAttention3D module, which preserves height-wise resolution through average pooling and assigns dynamic channel attention weights to each height layer. Extensive experiments on nuScenes-SurroundOcc and nuScenes-OpenOccupancy datasets verify that our method significantly enhances mean IoU, achieving especially pronounced gains on most small-object categories. Detailed ablation studies further validate the effectiveness of the proposed SliceSemOcc framework.

Related papers

Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion [52.959716866316604]
Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs.<n>Existing methods face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty.<n>We propose a textitMulti-Resolution Alignment (MRA) approach to mitigate voxel sparsity in camera-based 3D semantic scene completion.
arXiv Detail & Related papers (2026-02-03T10:46:51Z)
HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving [52.959716866316604]
Camera-based 3D semantic scene completion (SSC) plays a crucial role in autonomous driving.<n>Existing SSC methods suffer from the inherent input-output dimension gap and annotation-reality density gap.<n>We propose a corresponding High- Dimension High-Density Semantic Scene Completion framework with expanded pixel semantics and refined voxel occupancies.
arXiv Detail & Related papers (2025-11-11T07:24:35Z)
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction [17.38916914453357]
Predicting the 3D occupancy of large-scale outdoor scenes from 2D images is ill-posed and resource-intensive.<n>We present textbfDGOcc, a textbfGlobal query-based network for monocular 3D textbfOccupancy prediction.<n>The proposed method achieves the best performance on monocular semantic occupancy prediction while reducing GPU and time overhead.
arXiv Detail & Related papers (2025-04-10T07:44:55Z)
GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z)
SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction.<n> Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z)
Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN. SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues. Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z)
PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z)
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving. We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z)
3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation. Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z)
OccuSeg: Occupancy-aware 3D Instance Segmentation [39.71517989569514]
"3D occupancy size" is the number of voxels occupied by each instance. "OccuSeg" is an occupancy-aware 3D instance segmentation scheme. "State-of-the-art performance" on 3 real-world datasets.
arXiv Detail & Related papers (2020-03-14T02:48:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.