Kinematic 3D Object Detection in Monocular Video
- URL: http://arxiv.org/abs/2007.09548v1
- Date: Sun, 19 Jul 2020 01:15:12 GMT
- Title: Kinematic 3D Object Detection in Monocular Video
- Authors: Garrick Brazil, Gerard Pons-Moll, Xiaoming Liu, Bernt Schiele
- Abstract summary: We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
- Score: 123.7119180923524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perceiving the physical world in 3D is fundamental for self-driving
applications. Although temporal motion is an invaluable resource to human
vision for detection, tracking, and depth perception, such features have not
been thoroughly utilized in modern 3D object detectors. In this work, we
propose a novel method for monocular video-based 3D object detection which
carefully leverages kinematic motion to improve precision of 3D localization.
Specifically, we first propose a novel decomposition of object orientation as
well as a self-balancing 3D confidence. We show that both components are
critical to enable our kinematic model to work effectively. Collectively, using
only a single model, we efficiently leverage 3D kinematics from monocular
videos to improve the overall localization precision in 3D object detection
while also producing useful by-products of scene dynamics (ego-motion and
per-object velocity). We achieve state-of-the-art performance on monocular 3D
object detection and the Bird's Eye View tasks within the KITTI self-driving
dataset.
Related papers
- DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - TripletTrack: 3D Object Tracking using Triplet Embeddings and LSTM [0.0]
3D object tracking is a critical task in autonomous driving systems.
In this paper we investigate the use of triplet embeddings in combination with motion representations for 3D object tracking.
arXiv Detail & Related papers (2022-10-28T15:23:50Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Seeing by haptic glance: reinforcement learning-based 3D object
Recognition [31.80213713136647]
Human is able to conduct 3D recognition by a limited number of haptic contacts between the target object and his/her fingers without seeing the object.
This capability is defined as haptic glance' in cognitive neuroscience.
Most of the existing 3D recognition models were developed based on dense 3D data.
In many real-life use cases, where robots are used to collect 3D data by haptic exploration, only a limited number of 3D points could be collected.
A novel reinforcement learning based framework is proposed, where the haptic exploration procedure is optimized simultaneously with the objective 3D recognition with actively collected 3D
arXiv Detail & Related papers (2021-02-15T15:38:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.