Related papers: OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction

OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction

URL: http://arxiv.org/abs/2411.03696v1
Date: Wed, 06 Nov 2024 06:34:27 GMT
Title: OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction
Authors: Ji Zhang, Yiran Ding, Zixin Liu,
Abstract summary: 3D semantic occupancy prediction is crucial for ensuring the safety in autonomous driving. Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features. We propose OccLoff, a framework that Learns to optimize Feature Fusion for 3D occupancy prediction.
Score: 5.285847977231642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D semantic occupancy prediction is crucial for finely representing the surrounding environment, which is essential for ensuring the safety in autonomous driving. Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features, followed by computationally intensive 3D operations to fuse these with LiDAR features, leading to high computational costs and reduced accuracy. Moreover, current research on occupancy prediction predominantly focuses on designing specific network architectures, often tailored to particular models, with limited attention given to the more fundamental aspect of semantic feature learning. This gap hinders the development of more transferable methods that could enhance the performance of various occupancy models. To address these challenges, we propose OccLoff, a framework that Learns to Optimize Feature Fusion for 3D occupancy prediction. Specifically, we introduce a sparse fusion encoder with entropy masks that directly fuses 3D and 2D features, improving model accuracy while reducing computational overhead. Additionally, we propose a transferable proxy-based loss function and an adaptive hard sample weighting algorithm, which enhance the performance of several state-of-the-art methods. Extensive evaluations on the nuScenes and SemanticKITTI benchmarks demonstrate the superiority of our framework, and ablation studies confirm the effectiveness of each proposed module.

Related papers

TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness [13.68631587423815]
3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception. Existing occupancy prediction tasks are modeled using voxel or point cloud-based approaches. We propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information.
arXiv Detail & Related papers (2025-03-13T01:35:04Z)
FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving. Recent methods leverage the range-view representation to improve processing efficiency. We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z)
MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation [8.113965240054506]
We propose MR-Occ, a novel approach for camera-LiDAR fusion-based 3D semantic occupancy prediction. HVFR improves performance by enhancing features for critical voxels, reducing computational cost. MOD introduces an occluded' class to better handle regions obscured from sensor view, improving accuracy. PVF-Net leverages densified LiDAR features to effectively fuse camera and LiDAR data through a deformable attention mechanism.
arXiv Detail & Related papers (2024-12-29T14:39:21Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images. Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z)
DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection. It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence. This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction [5.285847977231642]
3D occupancy prediction based on multi-sensor fusion,crucial for a reliable autonomous driving system. Previous fusion-based 3D occupancy predictions relied on depth estimation for processing 2D image features. We propose OccFusion, a depth estimation free multi-modal fusion framework.
arXiv Detail & Related papers (2024-03-08T14:07:37Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC) Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin [32.172269679513285]
FlashOCC consolidates rapid and memory-efficient occupancy prediction. Channel-to-height transformation is introduced to lift the output logits from the BEV into the 3D space. Results substantiate the superiority of our plug-and-play paradigm over previous state-of-the-art methods.
arXiv Detail & Related papers (2023-11-18T15:28:09Z)
3D Harmonic Loss: Towards Task-consistent and Time-friendly 3D Object Detection on Edge for Intelligent Transportation System [28.55894241049706]
We propose a 3D harmonic loss function to relieve the pointcloud based inconsistent predictions. Our proposed method considerably improves the performance than benchmark models. Our code is open-source and publicly available.
arXiv Detail & Related papers (2022-11-07T10:11:48Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.