Related papers: Neural Rendering for Sensor Adaptation in 3D Object Detection

Neural Rendering for Sensor Adaptation in 3D Object Detection

URL: http://arxiv.org/abs/2508.12695v1
Date: Mon, 18 Aug 2025 07:53:45 GMT
Title: Neural Rendering for Sensor Adaptation in 3D Object Detection
Authors: Felix Embacher, David Holtz, Jonas Uhrig, Marius Cordts, Markus Enzweiler,
Abstract summary: We investigate the impact of the cross-sensor domain gap on state-of-the-art 3D object detectors.<n>We show that model architectures based on a dense Bird's Eye View (BEV) representation with backward projection, such as BEVFormer, are the most robust against varying sensor configurations.<n>We propose a novel data-driven sensor adaptation pipeline based on neural rendering, which can transform entire datasets to match different camera sensor setups.
Score: 3.10688583550805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous vehicles often have varying camera sensor setups, which is inevitable due to restricted placement options for different vehicle types. Training a perception model on one particular setup and evaluating it on a new, different sensor setup reveals the so-called cross-sensor domain gap, typically leading to a degradation in accuracy. In this paper, we investigate the impact of the cross-sensor domain gap on state-of-the-art 3D object detectors. To this end, we introduce CamShift, a dataset inspired by nuScenes and created in CARLA to specifically simulate the domain gap between subcompact vehicles and sport utility vehicles (SUVs). Using CamShift, we demonstrate significant cross-sensor performance degradation, identify robustness dependencies on model architecture, and propose a data-driven solution to mitigate the effect. On the one hand, we show that model architectures based on a dense Bird's Eye View (BEV) representation with backward projection, such as BEVFormer, are the most robust against varying sensor configurations. On the other hand, we propose a novel data-driven sensor adaptation pipeline based on neural rendering, which can transform entire datasets to match different camera sensor setups. Applying this approach improves performance across all investigated 3D object detectors, mitigating the cross-sensor domain gap by a large margin and reducing the need for new data collection by enabling efficient data reusability across vehicles with different sensor setups. The CamShift dataset and the sensor adaptation benchmark are available at https://dmholtz.github.io/camshift/.

Related papers

Domain Adaptation for Different Sensor Configurations in 3D Object Detection [1.4566410781522745]
We address domain adaptation across different sensor configurations in 3D object detection.<n>We propose two techniques: Downstream Fine-tuning and Partial Layer Fine-tuning.<n>Our findings provide a practical and scalable solution for adapting 3D object detection models to diverse vehicle platforms.
arXiv Detail & Related papers (2025-09-04T23:54:25Z)
Investigating Domain Gaps for Indoor 3D Object Detection [60.55242233729081]
We consider the task of adapting indoor 3D object detectors from one dataset to another.<n>In this paper, we present a benchmark with ScanNet, SUN RGB-D and 3D Front datasets, as well as our newly proposed large-scale datasets ProcTHOR-OD and ProcFront.<n>We conduct experiments on different adaptation scenarios including synthetic-to-real adaptation, point cloud quality adaptation, layout adaptation and instance feature adaptation, analyzing the impact of different domain gaps on 3D object detectors.
arXiv Detail & Related papers (2025-08-24T16:34:19Z)
OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction [9.099401529072324]
We propose OccCylindrical that merges and refines the different modality features under cylindrical coordinates.<n>Our method preserves more fine-grained geometry detail that leads to better performance.<n>Experiments conducted on the nuScenes dataset, including challenging rainy and nighttime scenarios, confirm our approach's effectiveness and state-of-the-art performance.
arXiv Detail & Related papers (2025-05-06T08:12:31Z)
ACROSS: A Deformation-Based Cross-Modal Representation for Robotic Tactile Perception [1.5566524830295307]
ACROSS is a framework for translating data between tactile sensors by exploiting sensor deformation information.<n>We transfer the tactile signals of a BioTac sensor to DIGIT tactile images.
arXiv Detail & Related papers (2024-11-13T11:29:14Z)
Adaptive Domain Learning for Cross-domain Image Denoising [57.4030317607274]
We present a novel adaptive domain learning scheme for cross-domain image denoising. We use existing data from different sensors (source domain) plus a small amount of data from the new sensor (target domain) The ADL training scheme automatically removes the data in the source domain that are harmful to fine-tuning a model for the target domain. Also, we introduce a modulation module to adopt sensor-specific information (sensor type and ISO) to understand input data for image denoising.
arXiv Detail & Related papers (2024-11-03T08:08:26Z)
Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection [7.464834150824093]
We propose two metrics to measure 3D object detection models' ability of detecting the closer surfaces to the sensor on the ego vehicle. We also propose a refinement head, named EdgeHead, to guide models to focus more on the learnable closer surfaces.
arXiv Detail & Related papers (2024-07-04T17:06:16Z)
Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector. We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z)
Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint. Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance. We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
3D-VField: Learning to Adversarially Deform Point Clouds for Robust 3D Object Detection [111.32054128362427]
In safety-critical settings, robustness on out-of-distribution and long-tail samples is fundamental to circumvent dangerous issues. We substantially improve the generalization of 3D object detectors to out-of-domain data by taking into account deformed point clouds during training. We propose and share open source CrashD: a synthetic dataset of realistic damaged and rare cars.
arXiv Detail & Related papers (2021-12-09T08:50:54Z)
Radar Voxel Fusion for 3D Object Detection [0.0]
This paper develops a low-level sensor fusion network for 3D object detection. The radar sensor fusion proves especially beneficial in inclement conditions such as rain and night scenes.
arXiv Detail & Related papers (2021-06-26T20:34:12Z)
Learning Camera Miscalibration Detection [83.38916296044394]
This paper focuses on a data-driven approach to learn the detection of miscalibration in vision sensors, specifically RGB cameras. Our contributions include a proposed miscalibration metric for RGB cameras and a novel semi-synthetic dataset generation pipeline based on this metric. By training a deep convolutional neural network, we demonstrate the effectiveness of our pipeline to identify whether a recalibration of the camera's intrinsic parameters is required or not.
arXiv Detail & Related papers (2020-05-24T10:32:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.