Exploring the Capabilities and Limits of 3D Monocular Object Detection
  -- A Study on Simulation and Real World Data
        - URL: http://arxiv.org/abs/2005.07424v1
- Date: Fri, 15 May 2020 09:05:17 GMT
- Title: Exploring the Capabilities and Limits of 3D Monocular Object Detection
  -- A Study on Simulation and Real World Data
- Authors: Felix Nobis, Fabian Brunhuber, Simon Janssen, Johannes Betz and Markus
  Lienkamp
- Abstract summary: 3D object detection based on monocular camera data is key enabler for autonomous driving.
Recent deep learning methods show promising results to recover depth information from single images.
In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   3D object detection based on monocular camera data is a key enabler for
autonomous driving. The task however, is ill-posed due to lack of depth
information in 2D images. Recent deep learning methods show promising results
to recover depth information from single images by learning priors about the
environment. Several competing strategies tackle this problem. In addition to
the network design, the major difference of these competing approaches lies in
using a supervised or self-supervised optimization loss function, which require
different data and ground truth information. In this paper, we evaluate the
performance of a 3D object detection pipeline which is parameterizable with
different depth estimation configurations. We implement a simple distance
calculation approach based on camera intrinsics and 2D bounding box size, a
self-supervised, and a supervised learning approach for depth estimation.
  Ground truth depth information cannot be recorded reliable in real world
scenarios. This shifts our training focus to simulation data. In simulation,
labeling and ground truth generation can be automatized. We evaluate the
detection pipeline on simulator data and a real world sequence from an
autonomous vehicle on a race track. The benefit of simulation training to real
world application is investigated. Advantages and drawbacks of the different
depth estimation strategies are discussed.
 
      
        Related papers
        - Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with   Two-Stage Depth Diffusion [16.720863475636328]
 3D data simulation aims to bridge the gap between simulated and real-captured 3D data.<n>Most 3D data simulation methods inject predefined physical priors but struggle to capture the full complexity of real data.<n>This work explores a new solution path, called Stable-Sim2Real, based on a novel two-stage depth diffusion model.
 arXiv  Detail & Related papers  (2025-07-31T12:08:16Z)
- Inverse Neural Rendering for Explainable Multi-Object Tracking [35.072142773300655]
 We recast 3D multi-object tracking from RGB cameras as an emphInverse Rendering (IR) problem.
We optimize an image loss over generative latent spaces that inherently disentangle shape and appearance properties.
We validate the generalization and scaling capabilities of our method by learning the generative prior exclusively from synthetic data.
 arXiv  Detail & Related papers  (2024-04-18T17:37:53Z)
- Motion Degeneracy in Self-supervised Learning of Elevation Angle
  Estimation for 2D Forward-Looking Sonar [4.683630397028384]
 This study aims to realize stable self-supervised learning of elevation angle estimation without pretraining using synthetic images.
We first analyze the motion field of 2D forward-looking sonar, which is related to the main supervision signal.
 arXiv  Detail & Related papers  (2023-07-30T08:06:11Z)
- FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth
  Prediction for Autonomous Driving [18.02943016671203]
 This study proposes a comprehensive self-supervised framework for accurate scale-aware depth prediction on autonomous driving scenes.
In particular, we introduce a Full-Scale depth prediction network named FSNet.
With FSNet, robots and vehicles with only one well-calibrated camera can collect sequences of training image frames and camera poses, and infer accurate 3D depths of the environment without extra labeling work or 3D data.
 arXiv  Detail & Related papers  (2023-04-21T03:17:04Z)
- ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
 We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
 arXiv  Detail & Related papers  (2022-12-12T13:10:19Z)
- Normal Transformer: Extracting Surface Geometry from LiDAR Points   Enhanced by Visual Semantics [7.507853813361308]
 We introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation.
We present a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information.
It has been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene.
 arXiv  Detail & Related papers  (2022-11-19T03:55:09Z)
- Towards Multimodal Multitask Scene Understanding Models for Indoor
  Mobile Agents [49.904531485843464]
 In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
 MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
 arXiv  Detail & Related papers  (2022-09-27T04:49:19Z)
- 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone [10.341296683155973]
 We propose using a self-supervised training strategy to learn a general point cloud backbone model for downstream 3D vision tasks.
Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a 3D detection head.
 Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly.
 arXiv  Detail & Related papers  (2022-05-02T07:53:29Z)
- RealNet: Combining Optimized Object Detection with Information Fusion
  Depth Estimation Co-Design Method on IoT [2.9275056713717285]
 We propose a co-design method combining the model-streamlined recognition algorithm, the depth estimation algorithm, and information fusion.
The method proposed in this paper is suitable for mobile platforms with high real-time request.
 arXiv  Detail & Related papers  (2022-04-24T08:35:55Z)
- Homography Loss for Monocular 3D Object Detection [54.04870007473932]
 A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
 arXiv  Detail & Related papers  (2022-04-02T03:48:03Z)
- Towards Scale Consistent Monocular Visual Odometry by Learning from the
  Virtual World [83.36195426897768]
 We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
 arXiv  Detail & Related papers  (2022-03-11T01:51:54Z)
- Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
 3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
 arXiv  Detail & Related papers  (2021-07-29T16:30:33Z)
- SimAug: Learning Robust Representations from Simulation for Trajectory
  Prediction [78.91518036949918]
 We propose a novel approach to learn robust representation through augmenting the simulation training data.
We show that SimAug achieves promising results on three real-world benchmarks using zero real training data.
 arXiv  Detail & Related papers  (2020-04-04T21:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.