MonoNext: A 3D Monocular Object Detection with ConvNext
- URL: http://arxiv.org/abs/2308.00596v1
- Date: Tue, 1 Aug 2023 15:15:40 GMT
- Title: MonoNext: A 3D Monocular Object Detection with ConvNext
- Authors: Marcelo Eduardo Pederiva, Jos\'e Mario De Martino and Alessandro
Zimmer
- Abstract summary: This paper introduces a new Multi-Tasking Learning approach called MonoNext for 3D Object Detection.
MonoNext employs a straightforward approach based on the ConvNext network and requires only 3D bounding box data.
In our experiments with the KITTI dataset, MonoNext achieved high precision and competitive performance comparable with state-of-the-art approaches.
- Score: 69.33657875725747
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autonomous driving perception tasks rely heavily on cameras as the primary
sensor for Object Detection, Semantic Segmentation, Instance Segmentation, and
Object Tracking. However, RGB images captured by cameras lack depth
information, which poses a significant challenge in 3D detection tasks. To
supplement this missing data, mapping sensors such as LIDAR and RADAR are used
for accurate 3D Object Detection. Despite their significant accuracy, the
multi-sensor models are expensive and require a high computational demand. In
contrast, Monocular 3D Object Detection models are becoming increasingly
popular, offering a faster, cheaper, and easier-to-implement solution for 3D
detections. This paper introduces a different Multi-Tasking Learning approach
called MonoNext that utilizes a spatial grid to map objects in the scene.
MonoNext employs a straightforward approach based on the ConvNext network and
requires only 3D bounding box annotated data. In our experiments with the KITTI
dataset, MonoNext achieved high precision and competitive performance
comparable with state-of-the-art approaches. Furthermore, by adding more
training data, MonoNext surpassed itself and achieved higher accuracies.
Related papers
- Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D
Object Detection [2.5158048364984564]
I proposed a network structure for multi-view 3D object detection using camera-only data and a Bird's-Eye-View map.
My work is based on a current key challenge domain adaptation and visual data transfer.
My study utilizes 3D information as available semantic information and 2D multi-view image features blending into the visual-language transfer design.
arXiv Detail & Related papers (2023-11-02T04:28:51Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z) - Single-Shot 3D Detection of Vehicles from Monocular RGB Images via
Geometry Constrained Keypoints in Real-Time [6.82446891805815]
We propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images.
Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters.
We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection and the novel nuScenes Object Detection benchmarks.
arXiv Detail & Related papers (2020-06-23T15:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.