Related papers: MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction

MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction

URL: http://arxiv.org/abs/2403.08512v2
Date: Mon, 19 Aug 2024 02:46:26 GMT
Title: MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction
Authors: Zikun Xu, Jianqiang Wang, Shaobing Xu,
Abstract summary: MergeOcc is developed to simultaneously handle different LiDARs by leveraging multiple datasets. The effectiveness of MergeOcc is validated through experiments on two prominent datasets for autonomous vehicles.
Score: 8.993992124170624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LiDAR-based 3D occupancy prediction evolved rapidly alongside the emergence of large datasets. Nevertheless, the potential of existing diverse datasets remains underutilized as they kick in individually. Models trained on a specific dataset often suffer considerable performance degradation when deployed to real-world scenarios or datasets involving disparate LiDARs. This paper aims to develop a generalized model called MergeOcc, to simultaneously handle different LiDARs by leveraging multiple datasets. The gaps among LiDAR datasets primarily manifest in geometric disparities and semantic inconsistencies. Thus, MergeOcc incorporates a novel model featuring a geometric realignment module and a semantic label mapping module to enable multiple datasets training (MDT). The effectiveness of MergeOcc is validated through experiments on two prominent datasets for autonomous vehicles: OpenOccupancy-nuScenes and SemanticKITTI. The results demonstrate its enhanced robustness and remarkable performance across both types of LiDARs, outperforming several SOTA multi-modality methods. Notably, despite using an identical model architecture and hyper-parameter set, MergeOcc can significantly surpass the baseline due to its exposure to more diverse data. MergeOcc is considered the first cross-dataset 3D occupancy prediction pipeline that effectively bridges the domain gap for seamless deployment across heterogeneous platforms.

Related papers

TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset [90.97440987655084]
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources.<n>To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN.<n>This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m2$ and currently 767 GB of data.
arXiv Detail & Related papers (2025-05-12T09:48:32Z)
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z)
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection. We introduce multi-stage prompting modules for multi-dataset 3D detection. Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z)
Multi-Space Alignments Towards Universal LiDAR Segmentation [50.992103482269016]
M3Net is a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation. We first combine large-scale driving datasets acquired by different types of sensors from diverse scenes. We then conduct alignments in three spaces, namely data, feature, and label spaces, during the training.
arXiv Detail & Related papers (2024-05-02T17:59:57Z)
An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z)
Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z)
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z)
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning. It can overcome the negative transfer associated with synergistic learning and produce generalizable representations. It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z)
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training. Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z)
Shared Manifold Learning Using a Triplet Network for Multiple Sensor Translation and Fusion with Missing Data [2.452410403088629]
We propose a Contrastive learning based MultiModal Alignment Network (CoMMANet) to align data from different sensors into a shared and discriminative manifold. The proposed architecture uses a multimodal triplet autoencoder to cluster the latent space in such a way that samples of the same classes from each heterogeneous modality are mapped close to each other.
arXiv Detail & Related papers (2022-10-25T20:22:09Z)
AVIDA: Alternating method for Visualizing and Integrating Data [1.6637373649145604]
AVIDA is a framework for simultaneously performing data alignment and dimension reduction. We show that AVIDA correctly aligns high-dimensional datasets without common features. In general applications, other methods can be used for the alignment and dimension reduction modules.
arXiv Detail & Related papers (2022-05-31T22:36:10Z)
Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model [36.993630058695345]
We propose a shared and specific feature learning (S2FL) model to decomposing multimodal RS data into modality-shared and modality-specific components. To better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 -- hyperspectral and multispectral data, Berlin -- hyperspectral and synthetic aperture radar (SAR) data, Augsburg -- hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification.
arXiv Detail & Related papers (2021-05-21T08:14:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.