Monocular 3D Object Detection using Multi-Stage Approaches with
Attention and Slicing aided hyper inference
- URL: http://arxiv.org/abs/2212.11804v1
- Date: Thu, 22 Dec 2022 15:36:07 GMT
- Title: Monocular 3D Object Detection using Multi-Stage Approaches with
Attention and Slicing aided hyper inference
- Authors: Abonia Sojasingarayar, Ashish Patel
- Abstract summary: 3D object detection is vital as it would enable us to capture objects' sizes, orientation, and position in the world.
We would be able to use this 3D detection in real-world applications such as Augmented Reality (AR), self-driving cars, and robotics.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: 3D object detection is vital as it would enable us to capture objects' sizes,
orientation, and position in the world. As a result, we would be able to use
this 3D detection in real-world applications such as Augmented Reality (AR),
self-driving cars, and robotics which perceive the world the same way we do as
humans. Monocular 3D Object Detection is the task to draw 3D bounding box
around objects in a single 2D RGB image. It is localization task but without
any extra information like depth or other sensors or multiple images. Monocular
3D object detection is an important yet challenging task. Beyond the
significant progress in image-based 2D object detection, 3D understanding of
real-world objects is an open challenge that has not been explored extensively
thus far. In addition to the most closely related studies.
Related papers
- Counting Stacked Objects from Multi-View Images [57.68870743111393]
We propose a novel 3D counting approach that decomposes the task into two complementary subproblems.
By combining geometric reconstruction and deep learning-based depth analysis, our method can accurately count identical objects within containers.
We validate our 3D Counting pipeline on diverse real-world and large-scale synthetic datasets.
arXiv Detail & Related papers (2024-11-28T13:51:16Z) - Improving Distant 3D Object Detection Using 2D Box Supervision [97.80225758259147]
We propose LR3D, a framework that learns to recover the missing depth of distant objects.
Our framework is general, and could widely benefit 3D detection methods to a large extent.
arXiv Detail & Related papers (2024-03-14T09:54:31Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - TANDEM3D: Active Tactile Exploration for 3D Object Recognition [16.548376556543015]
We propose TANDEM3D, a method that applies a co-training framework for 3D object recognition with tactile signals.
TANDEM3D is based on a novel encoder that builds 3D object representation from contact positions and normals using PointNet++.
Our method is trained entirely in simulation and validated with real-world experiments.
arXiv Detail & Related papers (2022-09-19T05:54:26Z) - Aerial Monocular 3D Object Detection [67.20369963664314]
DVDET is proposed to achieve aerial monocular 3D object detection in both the 2D image space and the 3D physical space.
To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module.
To encourage more researchers to investigate this area, we will release the dataset and related code.
arXiv Detail & Related papers (2022-08-08T08:32:56Z) - Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object
Detection [17.526914782562528]
We propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning (GSL)
Our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
arXiv Detail & Related papers (2022-04-25T12:10:34Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.