WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2407.20818v1
- Date: Tue, 30 Jul 2024 13:32:34 GMT
- Title: WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection
- Authors: Xingcheng Zhou, Deyu Fu, Walter Zimmer, Mingyu Liu, Venkatnarayanan Lakshminarasimhan, Leah Strand, Alois C. Knoll,
- Abstract summary: Existing roadside perception systems are limited by the absence of publicly available, large-scale, high-quality 3D datasets.
We introduce the TUMTraf Synthetic dataset, offering a diverse and substantial collection of high-quality 3D data.
We also present WARM-3D, a framework to aid the Sim2Real domain transfer for roadside monocular 3D detection.
- Score: 23.07748567871443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing roadside perception systems are limited by the absence of publicly available, large-scale, high-quality 3D datasets. Exploring the use of cost-effective, extensive synthetic datasets offers a viable solution to tackle this challenge and enhance the performance of roadside monocular 3D detection. In this study, we introduce the TUMTraf Synthetic Dataset, offering a diverse and substantial collection of high-quality 3D data to augment scarce real-world datasets. Besides, we present WARM-3D, a concise yet effective framework to aid the Sim2Real domain transfer for roadside monocular 3D detection. Our method leverages cheap synthetic datasets and 2D labels from an off-the-shelf 2D detector for weak supervision. We show that WARM-3D significantly enhances performance, achieving a +12.40% increase in mAP 3D over the baseline with only pseudo-2D supervision. With 2D GT as weak labels, WARM-3D even reaches performance close to the Oracle baseline. Moreover, WARM-3D improves the ability of 3D detectors to unseen sample recognition across various real-world environments, highlighting its potential for practical applications.
Related papers
- SurgTrack: CAD-Free 3D Tracking of Real-world Surgical Instruments [21.536823332387993]
Vision-based surgical navigation has received increasing attention due to its non-invasive, cost-effective, and flexible advantages.
In particular, a critical element of the vision-based navigation system is tracking surgical instruments.
We propose the SurgTrack, a two-stage 3D instrument tracking method for CAD-free and robust real-world applications.
arXiv Detail & Related papers (2024-09-04T10:29:59Z) - Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance [11.090775523892074]
We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data.
Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues.
Our method achieves up to 85% of the fully-supervised performance using only 10% labeled data.
arXiv Detail & Related papers (2024-08-21T12:13:18Z) - Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training [9.272389295055271]
This study investigates the pipeline for training a monocular 3D object detection model on a diverse collection of 3D and 2D datasets.
The proposed framework comprises three components: (1) a robust monocular 3D model capable of functioning across various camera settings, (2) a selective-training strategy to accommodate datasets with differing class annotations, and (3) a pseudo 3D training approach using 2D labels to enhance detection performance in scenes containing only 2D labels.
arXiv Detail & Related papers (2023-10-02T06:17:24Z) - Real3D-AD: A Dataset of Point Cloud Anomaly Detection [75.56719157477661]
We introduce Real3D-AD, a challenging high-precision point cloud anomaly detection dataset.
With 1,254 high-resolution 3D items from forty thousand to millions of points for each item, Real3D-AD is the largest dataset for high-precision 3D industrial anomaly detection.
We present a comprehensive benchmark for Real3D-AD, revealing the absence of baseline methods for high-precision point cloud anomaly detection.
arXiv Detail & Related papers (2023-09-23T00:43:38Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - Aerial Monocular 3D Object Detection [46.26215100532241]
This work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space.
To address the dataset challenge, we propose a new large-scale simulation dataset named AM3D-Sim, generated by the co-simulation of AirSIM and CARLA, and a new real-world aerial dataset named AM3D-Real, collected by DJI Matrice 300 RTK.
arXiv Detail & Related papers (2022-08-08T08:32:56Z) - Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery [32.73513554145019]
Domain Adaptive 3D Pose Augmentation (DAPA) is a data augmentation method that enhances the model's generalization ability in in-the-wild scenarios.
We show quantitatively that finetuning with DAPA effectively improves results on benchmarks 3DPW and AGORA.
arXiv Detail & Related papers (2022-06-21T15:02:31Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.