RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-based 3D Neural Modeling
- URL: http://arxiv.org/abs/2405.16868v2
- Date: Sat, 15 Mar 2025 06:27:08 GMT
- Title: RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-based 3D Neural Modeling
- Authors: Tianhang Wang, Fan Lu, Zehan Zheng, Zhijun Li, Guang Chen, Changjun Jiang,
- Abstract summary: We introduce a new robust camera-insensitivity problem: how to overcome the issues caused by the failed camera perspectives?<n>We propose RCDN, a Robust Camera-insensitivity collaborative perception with a novel Dynamic feature-based 3D Neural modeling mechanism.
- Score: 13.980022113881697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collaborative perception is dedicated to tackling the constraints of single-agent perception, such as occlusions, based on the multiple agents' multi-view sensor inputs. However, most existing works assume an ideal condition that all agents' multi-view cameras are continuously available. In reality, cameras may be highly noisy, obscured or even failed during the collaboration. In this work, we introduce a new robust camera-insensitivity problem: how to overcome the issues caused by the failed camera perspectives, while stabilizing high collaborative performance with low calibration cost? To address above problems, we propose RCDN, a Robust Camera-insensitivity collaborative perception with a novel Dynamic feature-based 3D Neural modeling mechanism. The key intuition of RCDN is to construct collaborative neural rendering field representations to recover failed perceptual messages sent by multiple agents. To better model collaborative neural rendering field, RCDN first establishes a geometry BEV feature based time-invariant static field with other agents via fast hash grid modeling. Based on the static background field, the proposed time-varying dynamic field can model corresponding motion vectors for foregrounds with appropriate positions. To validate RCDN, we create OPV2V-N, a new large-scale dataset with manual labelling under different camera failed scenarios. Extensive experiments conducted on OPV2V-N show that RCDN can be ported to other baselines and improve their robustness in extreme camera-insensitivity settings.
Related papers
- FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.
Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.
We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image.
We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics.
By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z) - DVPE: Divided View Position Embedding for Multi-View 3D Object Detection [7.791229698270439]
Current research faces challenges in balancing between receptive fields and reducing interference when aggregating multi-view features.
This paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space.
Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set.
arXiv Detail & Related papers (2024-07-24T02:44:41Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Collaboration Helps Camera Overtake LiDAR in 3D Detection [49.58433319402405]
Camera-only 3D detection provides a simple solution for localizing objects in 3D space compared to LiDAR-based detection systems.
Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication.
Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70.
arXiv Detail & Related papers (2023-03-23T03:50:41Z) - Robustifying the Multi-Scale Representation of Neural Radiance Fields [86.69338893753886]
We present a robust multi-scale neural radiance fields representation approach to overcome both real-world imaging issues.
Our method handles multi-scale imaging effects and camera-pose estimation problems with NeRF-inspired approaches.
We demonstrate, with examples, that for an accurate neural representation of an object from day-to-day acquired multi-view images, it is crucial to have precise camera-pose estimates.
arXiv Detail & Related papers (2022-10-09T11:46:45Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Camera-Conditioned Stable Feature Generation for Isolated Camera
Supervised Person Re-IDentification [24.63519986072777]
Cross-camera images could be unavailable under the ISolated Camera Supervised setting, e.g., a surveillance system deployed across distant scenes.
A new pipeline is introduced by synthesizing the cross-camera samples in the feature space for model training.
Experiments on two ISCS person Re-ID datasets demonstrate the superiority of our CCSFG to the competitors.
arXiv Detail & Related papers (2022-03-29T03:10:24Z) - Stereo Neural Vernier Caliper [57.187088191829886]
We propose a new object-centric framework for learning-based stereo 3D object detection.
We tackle a problem of how to predict a refined update given an initial 3D cuboid guess.
Our approach achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2022-03-21T14:36:07Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - View Invariant Human Body Detection and Pose Estimation from Multiple
Depth Sensors [0.7080990243618376]
We propose an end-to-end multi-person 3D pose estimation network, Point R-CNN, using multiple point cloud sources.
We conduct extensive experiments to simulate challenging real world cases, such as individual camera failures, various target appearances, and complex cluttered scenes.
In the meantime, we show our end-to-end network greatly outperforms cascaded state-of-the-art models.
arXiv Detail & Related papers (2020-05-08T19:06:28Z) - siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera
3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach.
associations are exploited to enhance the 3D box regression of the object.
The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z) - A Bayesian Filter for Multi-view 3D Multi-object Tracking with Occlusion
Handling [2.824395407508717]
The proposed algorithm has a linear complexity in the total number of detections across the cameras.
It operates in the 3D world frame, and provides 3D trajectory estimates of the objects.
The proposed algorithm is evaluated on the latest WILDTRACKS dataset, and demonstrated to work in very crowded scenes.
arXiv Detail & Related papers (2020-01-13T09:34:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.