FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2104.10956v1
- Date: Thu, 22 Apr 2021 09:35:35 GMT
- Title: FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
- Authors: Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin
- Abstract summary: It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
- Score: 78.00922683083776
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Monocular 3D object detection is an important task for autonomous driving
considering its advantage of low cost. It is much more challenging compared to
conventional 2D case due to its inherent ill-posed property, which is mainly
reflected on the lack of depth information. Recent progress on 2D detection
offers opportunities to better solving this problem. However, it is non-trivial
to make a general adapted 2D detector work in this 3D task. In this technical
report, we study this problem with a practice built on fully convolutional
single-stage detector and propose a general framework FCOS3D. Specifically, we
first transform the commonly defined 7-DoF 3D targets to image domain and
decouple it as 2D and 3D attributes. Then the objects are distributed to
different feature levels with the consideration of their 2D scales and assigned
only according to the projected 3D-center for training procedure. Furthermore,
the center-ness is redefined with a 2D Guassian distribution based on the
3D-center to fit the 3D target formulation. All of these make this framework
simple yet effective, getting rid of any 2D detection or 2D-3D correspondence
priors. Our solution achieves 1st place out of all the vision-only methods in
the nuScenes 3D detection challenge of NeurIPS 2020. Code and models are
released at https://github.com/open-mmlab/mmdetection3d.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors [6.3557174349423455]
We present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results.
The largest improvement that QAF2D can bring about on the nuScenes validation subset is $2.3%$ NDS and $2.7%$ mAP.
arXiv Detail & Related papers (2024-03-10T04:38:27Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.