OBMO: One Bounding Box Multiple Objects for Monocular 3D Object
Detection
- URL: http://arxiv.org/abs/2212.10049v2
- Date: Tue, 20 Feb 2024 08:07:46 GMT
- Title: OBMO: One Bounding Box Multiple Objects for Monocular 3D Object
Detection
- Authors: Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai
- Abstract summary: monocular 3D object detection has attracted much attention due to its simple configuration.
In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity.
We propose a plug-and-play module, underlineOne underlineBounding Box underlineMultiple underlineObjects (OBMO)
- Score: 24.9579490539696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared to typical multi-sensor systems, monocular 3D object detection has
attracted much attention due to its simple configuration. However, there is
still a significant gap between LiDAR-based and monocular-based methods. In
this paper, we find that the ill-posed nature of monocular imagery can lead to
depth ambiguity. Specifically, objects with different depths can appear with
the same bounding boxes and similar visual features in the 2D image.
Unfortunately, the network cannot accurately distinguish different depths from
such non-discriminative visual features, resulting in unstable depth training.
To facilitate depth learning, we propose a simple yet effective plug-and-play
module, \underline{O}ne \underline{B}ounding Box \underline{M}ultiple
\underline{O}bjects (OBMO). Concretely, we add a set of suitable pseudo labels
by shifting the 3D bounding box along the viewing frustum. To constrain the
pseudo-3D labels to be reasonable, we carefully design two label scoring
strategies to represent their quality. In contrast to the original hard depth
labels, such soft pseudo labels with quality scores allow the network to learn
a reasonable depth range, boosting training stability and thus improving final
performance. Extensive experiments on KITTI and Waymo benchmarks show that our
method significantly improves state-of-the-art monocular 3D detectors by a
significant margin (The improvements under the moderate setting on KITTI
validation set are $\mathbf{1.82\sim 10.91\%}$ \textbf{mAP in BEV} and
$\mathbf{1.18\sim 9.36\%}$ \textbf{mAP in 3D}). Codes have been released at
\url{https://github.com/mrsempress/OBMO}.
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection [108.672972439282]
We introduce a novel decoupled pseudo-labeling (DPL) approach for SSM3OD.
Our approach features a Decoupled Pseudo-label Generation (DPG) module, designed to efficiently generate pseudo-labels.
We also present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels.
arXiv Detail & Related papers (2024-03-26T05:12:18Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - DetMatch: Two Teachers are Better Than One for Joint 2D and 3D
Semi-Supervised Object Detection [29.722784254501768]
DetMatch is a flexible framework for joint semi-supervised learning on 2D and 3D modalities.
By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels.
We leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes.
arXiv Detail & Related papers (2022-03-17T17:58:00Z) - WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection [29.616568669869206]
Existing monocular 3D detection methods rely on manually annotated 3D box labels on the LiDAR point clouds.
In this paper, we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision.
This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points.
arXiv Detail & Related papers (2022-03-16T00:37:08Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.