GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D
Object Detection
- URL: http://arxiv.org/abs/2103.17202v1
- Date: Wed, 31 Mar 2021 16:29:50 GMT
- Title: GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D
Object Detection
- Authors: Abhinav Kumar, Garrick Brazil and Xiaoming Liu
- Abstract summary: We present and integrate GrooMeD-NMS -- a novel Grouped Mathematically Differentiable NMS for monocular 3D object detection.
GrooMeD-NMS addresses the mismatch between training and inference pipelines.
It achieves state-of-the-art monocular 3D object detection results on the KITTI benchmark dataset.
- Score: 25.313894069303718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern 3D object detectors have immensely benefited from the end-to-end
learning idea. However, most of them use a post-processing algorithm called
Non-Maximal Suppression (NMS) only during inference. While there were attempts
to include NMS in the training pipeline for tasks such as 2D object detection,
they have been less widely adopted due to a non-mathematical expression of the
NMS. In this paper, we present and integrate GrooMeD-NMS -- a novel Grouped
Mathematically Differentiable NMS for monocular 3D object detection, such that
the network is trained end-to-end with a loss on the boxes after NMS. We first
formulate NMS as a matrix operation and then group and mask the boxes in an
unsupervised manner to obtain a simple closed-form expression of the NMS.
GrooMeD-NMS addresses the mismatch between training and inference pipelines
and, therefore, forces the network to select the best 3D box in a
differentiable manner. As a result, GrooMeD-NMS achieves state-of-the-art
monocular 3D object detection results on the KITTI benchmark dataset performing
comparably to monocular video-based methods. Code and models at
https://github.com/abhi1kumar/groomed_nms
Related papers
- EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - UniDepth: Universal Monocular Metric Depth Estimation [81.80512457953903]
We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains.
Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations.
Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
arXiv Detail & Related papers (2024-03-27T18:06:31Z) - Fuzzy-NMS: Improving 3D Object Detection with Fuzzy Classification in
NMS [19.452760776980472]
Non-maximum suppression (NMS) is an essential post-processing module used in many 3D object detection frameworks.
We introduce fuzzy learning into NMS and propose a novel generalized Fuzzy-NMS module to achieve finer candidate bounding box filtering.
arXiv Detail & Related papers (2023-10-21T09:09:03Z) - Detection Selection Algorithm: A Likelihood based Optimization Method to
Perform Post Processing for Object Detection [1.7188280334580197]
In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used.
In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA)
DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood.
arXiv Detail & Related papers (2022-12-12T05:15:18Z) - Learning Auxiliary Monocular Contexts Helps Monocular 3D Object
Detection [15.185462008629848]
Monocular 3D object detection aims to localize 3D bounding boxes in an input single 2D image.
This paper proposes a simple yet effective formulation for monocular 3D object detection without exploiting any extra information.
It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection.
arXiv Detail & Related papers (2021-12-09T00:05:34Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z) - Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in
Crowded Traffic Scenes [7.998326245039892]
Modern 2D object detection frameworks predict multiple bounding boxes per object that are refined using Non-Maximum-Suppression (NMS) to suppress all but one bounding box.
Our novel Visibility Guided NMS (vg-NMS) leverages both pixel-based as well as amodal object detection paradigms and improves the detection performance especially for highly occluded objects with little computational overhead.
We evaluate vg-NMS using KITTI, VIPER as well as the Synscapes dataset and show that it outperforms current state-of-the-art NMS.
arXiv Detail & Related papers (2020-06-15T17:03:23Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.