EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler
- URL: http://arxiv.org/abs/2504.09540v1
- Date: Sun, 13 Apr 2025 12:10:49 GMT
- Title: EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler
- Authors: Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang,
- Abstract summary: This paper introduces EmbodiedOcc++, enhancing the original framework with two key innovations.<n>A Geometry-guided Refinement Module (GRM) constrains Gaussian updates through plane regularization, along with a Semantic-aware Uncertainty Sampler (SUS)<n>Experiments on the EmbodiedOcc-ScanNet benchmark demonstrate that EmbodiedOcc achieves state-of-the-art performance across different settings.
- Score: 43.277357306520145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online 3D occupancy prediction provides a comprehensive spatial understanding of embodied environments. While the innovative EmbodiedOcc framework utilizes 3D semantic Gaussians for progressive indoor occupancy prediction, it overlooks the geometric characteristics of indoor environments, which are primarily characterized by planar structures. This paper introduces EmbodiedOcc++, enhancing the original framework with two key innovations: a Geometry-guided Refinement Module (GRM) that constrains Gaussian updates through plane regularization, along with a Semantic-aware Uncertainty Sampler (SUS) that enables more effective updates in overlapping regions between consecutive frames. GRM regularizes the position update to align with surface normals. It determines the adaptive regularization weight using curvature-based and depth-based constraints, allowing semantic Gaussians to align accurately with planar surfaces while adapting in complex regions. To effectively improve geometric consistency from different views, SUS adaptively selects proper Gaussians to update. Comprehensive experiments on the EmbodiedOcc-ScanNet benchmark demonstrate that EmbodiedOcc++ achieves state-of-the-art performance across different settings. Our method demonstrates improved edge accuracy and retains more geometric details while ensuring computational efficiency, which is essential for online embodied perception. The code will be released at: https://github.com/PKUHaoWang/EmbodiedOcc2.
Related papers
- TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness [13.68631587423815]
3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception.<n>Existing occupancy prediction tasks are modeled using voxel or point cloud-based approaches.<n>We propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information.
arXiv Detail & Related papers (2025-03-13T01:35:04Z) - CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting [5.8678184183132265]
CDGS is a confidence-aware depth regularization approach developed to enhance 3DGS.
We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision.
Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy.
arXiv Detail & Related papers (2025-02-20T16:12:13Z) - EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding [63.99937807085461]
3D occupancy prediction provides a comprehensive description of the surrounding scenes.<n>Most existing methods focus on offline perception from one or a few views.<n>We formulate an embodied 3D occupancy prediction task to target this practical scenario and propose a Gaussian-based EmbodiedOcc framework to accomplish it.
arXiv Detail & Related papers (2024-12-05T17:57:09Z) - AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones [19.429461194706786]
We propose an approach for joint surface depth and normal refinement of Gaussian Splatting methods for accurate 3D reconstruction of indoor scenes.<n>Our filtering strategy and optimization design demonstrate significant improvements in both mesh estimation and novel-view synthesis.
arXiv Detail & Related papers (2024-11-28T17:04:32Z) - Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation [32.50849425431012]
For autonomous cars equipped with multi-camera and LiDAR, it is critical to aggregate multi-sensor information into a unified 3D space for accurate and robust predictions.
Recent methods are mainly built on the 2D-to-3D transformation that relies on sensor calibration to project the 2D image information into the 3D space.
In this work, we propose a calibration-free spatial transformation based on vanilla attention to implicitly model the spatial correspondence.
arXiv Detail & Related papers (2024-11-19T02:40:42Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement [8.592248643229675]
Occupancy prediction plays a pivotal role in autonomous driving (AD)
Existing methods often incur high computational costs, which contradicts the real-time demands of AD.
We propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation.
arXiv Detail & Related papers (2024-07-18T04:46:13Z) - GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane [53.388937705785025]
3D open-vocabulary scene understanding is crucial for advancing augmented reality and robotic applications.
We introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS)
Our method treats the feature selection process as a hyperplane division within the feature space, retaining only features that are highly relevant to the query.
arXiv Detail & Related papers (2024-05-27T18:57:18Z) - Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.