Related papers: OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

URL: http://arxiv.org/abs/2212.10049v2
Date: Tue, 20 Feb 2024 08:07:46 GMT
Title: OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
Authors: Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai
Abstract summary: monocular 3D object detection has attracted much attention due to its simple configuration. In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity. We propose a plug-and-play module, underlineOne underlineBounding Box underlineMultiple underlineObjects (OBMO)
Score: 24.9579490539696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due to its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity. Specifically, objects with different depths can appear with the same bounding boxes and similar visual features in the 2D image. Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training. To facilitate depth learning, we propose a simple yet effective plug-and-play module, \underline{O}ne \underline{B}ounding Box \underline{M}ultiple \underline{O}bjects (OBMO). Concretely, we add a set of suitable pseudo labels by shifting the 3D bounding box along the viewing frustum. To constrain the pseudo-3D labels to be reasonable, we carefully design two label scoring strategies to represent their quality. In contrast to the original hard depth labels, such soft pseudo labels with quality scores allow the network to learn a reasonable depth range, boosting training stability and thus improving final performance. Extensive experiments on KITTI and Waymo benchmarks show that our method significantly improves state-of-the-art monocular 3D detectors by a significant margin (The improvements under the moderate setting on KITTI validation set are $\mathbf{1.82\sim 10.91\%}$ \textbf{mAP in BEV} and $\mathbf{1.18\sim 9.36\%}$ \textbf{mAP in 3D}). Codes have been released at \url{https://github.com/mrsempress/OBMO}.

Related papers

Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision. densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z)
MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection. We propose to increase the complementarity of depths with two novel designs. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z)
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection [108.672972439282]
We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD. Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels. We also present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels.
arXiv Detail & Related papers (2024-03-26T05:12:18Z)
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z)
Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application. Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase. We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z)
Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection. Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon. Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z)
DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection [29.722784254501768]
DetMatch is a flexible framework for joint semi-supervised learning on 2D and 3D modalities. By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels. We leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes.
arXiv Detail & Related papers (2022-03-17T17:58:00Z)
WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection [29.616568669869206]
Existing monocular 3D detection methods rely on manually annotated 3D box labels on the LiDAR point clouds. In this paper, we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision. This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points.
arXiv Detail & Related papers (2022-03-16T00:37:08Z)
Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations. In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.