Anchor Distance for 3D Multi-Object Distance Estimation from 2D Single
Shot
- URL: http://arxiv.org/abs/2101.10399v2
- Date: Tue, 16 Feb 2021 17:57:39 GMT
- Title: Anchor Distance for 3D Multi-Object Distance Estimation from 2D Single
Shot
- Authors: Hyeonwoo Yu and Jean Oh
- Abstract summary: We present a real time approach for estimating the distances to multiple objects in a scene using only a single-shot image.
We let the predictors catch the distance prior using anchor distance and train the network based on the distance.
The proposed method achieves about 30 FPS speed, and shows the lowest RMSE compared to the existing methods.
- Score: 15.815583594196488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual perception of the objects in a 3D environment is a key to successful
performance in autonomous driving and simultaneous localization and mapping
(SLAM). In this paper, we present a real time approach for estimating the
distances to multiple objects in a scene using only a single-shot image. Given
a 2D Bounding Box (BBox) and object parameters, a 3D distance to the object can
be calculated directly using 3D reprojection; however, such methods are prone
to significant errors because an error from the 2D detection can be amplified
in 3D. In addition, it is also challenging to apply such methods to a real-time
system due to the computational burden. In the case of the traditional
multi-object detection methods, %they mostly pay attention to existing works
have been developed for specific tasks such as object segmentation or 2D BBox
regression. These methods introduce the concept of anchor BBox for elaborate 2D
BBox estimation, and predictors are specialized and trained for specific 2D
BBoxes. In order to estimate the distances to the 3D objects from a single 2D
image, we introduce the notion of \textit{anchor distance} based on an object's
location and propose a method that applies the anchor distance to the
multi-object detector structure. We let the predictors catch the distance prior
using anchor distance and train the network based on the distance. The
predictors can be characterized to the objects located in a specific distance
range. By propagating the distance prior using a distance anchor to the
predictors, it is feasible to perform the precise distance estimation and
real-time execution simultaneously. The proposed method achieves about 30 FPS
speed, and shows the lowest RMSE compared to the existing methods.
Related papers
- Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - OSOP: A Multi-Stage One Shot Object Pose Estimation Framework [35.89334617258322]
We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects.
At test time, it takes as input a target image and a textured 3D query model.
We evaluate the method on LineMOD, Occlusion, Homebrewed, YCB-V and TLESS datasets.
arXiv Detail & Related papers (2022-03-29T13:12:00Z) - Absolute distance prediction based on deep learning object detection and
monocular depth estimation models [10.563101152143817]
This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image.
The proposed framework is promising and it yields an accuracy of 96% with RMSE of 0.203 of the correct absolute distance.
arXiv Detail & Related papers (2021-11-02T16:29:13Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - Geometry-based Distance Decomposition for Monocular 3D Object Detection [48.63934632884799]
We propose a novel geometry-based distance decomposition to recover the distance by its factors.
The decomposition factors the distance of objects into the most representative and stable variables.
Our method directly predicts 3D bounding boxes from RGB images with a compact architecture.
arXiv Detail & Related papers (2021-04-08T13:57:30Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.