Learning Auxiliary Monocular Contexts Helps Monocular 3D Object
Detection
- URL: http://arxiv.org/abs/2112.04628v1
- Date: Thu, 9 Dec 2021 00:05:34 GMT
- Title: Learning Auxiliary Monocular Contexts Helps Monocular 3D Object
Detection
- Authors: Xianpeng Liu, Nan Xue, Tianfu Wu
- Abstract summary: Monocular 3D object detection aims to localize 3D bounding boxes in an input single 2D image.
This paper proposes a simple yet effective formulation for monocular 3D object detection without exploiting any extra information.
It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection.
- Score: 15.185462008629848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection aims to localize 3D bounding boxes in an input
single 2D image. It is a highly challenging problem and remains open,
especially when no extra information (e.g., depth, lidar and/or multi-frames)
can be leveraged in training and/or inference. This paper proposes a simple yet
effective formulation for monocular 3D object detection without exploiting any
extra information. It presents the MonoCon method which learns Monocular
Contexts, as auxiliary tasks in training, to help monocular 3D object
detection. The key idea is that with the annotated 3D bounding boxes of objects
in an image, there is a rich set of well-posed projected 2D supervision signals
available in training, such as the projected corner keypoints and their
associated offset vectors with respect to the center of 2D bounding box, which
should be exploited as auxiliary tasks in training. The proposed MonoCon is
motivated by the Cramer-Wold theorem in measure theory at a high level. In
implementation, it utilizes a very simple end-to-end design to justify the
effectiveness of learning auxiliary monocular contexts, which consists of three
components: a Deep Neural Network (DNN) based feature backbone, a number of
regression head branches for learning the essential parameters used in the 3D
bounding box prediction, and a number of regression head branches for learning
auxiliary contexts. After training, the auxiliary context regression branches
are discarded for better inference efficiency. In experiments, the proposed
MonoCon is tested in the KITTI benchmark (car, pedestrain and cyclist). It
outperforms all prior arts in the leaderboard on car category and obtains
comparable performance on pedestrian and cyclist in terms of accuracy. Thanks
to the simple design, the proposed MonoCon method obtains the fastest inference
speed with 38.7 fps in comparisons
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - MonoSKD: General Distillation Framework for Monocular 3D Object
Detection via Spearman Correlation Coefficient [11.48914285491747]
Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly.
We propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient.
Our framework achieves state-of-the-art performance until submission with no additional inference computational cost.
arXiv Detail & Related papers (2023-10-17T14:48:02Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.