Work-Efficient Parallel Non-Maximum Suppression Kernels
- URL: http://arxiv.org/abs/2502.00535v1
- Date: Sat, 01 Feb 2025 19:23:13 GMT
- Title: Work-Efficient Parallel Non-Maximum Suppression Kernels
- Authors: David Oro, Carles Fernández, Xavier Martorell, Javier Hernando,
- Abstract summary: Non-Maximum Suppression (NMS) is the process of selecting a single representative candidate within a cluster of detections.
We present a highly scalable NMS algorithm for embedded GPU architectures that is designed from scratch to handle workloads featuring thousands of simultaneous detections.
Our proposed parallel greedy NMS algorithm yields a 14x-40x speed up when compared to state-of-the-art NMS methods.
- Score: 8.872464006522929
- License:
- Abstract: In the context of object detection, sliding-window classifiers and single-shot Convolutional Neural Network (CNN) meta-architectures typically yield multiple overlapping candidate windows with similar high scores around the true location of a particular object. Non-Maximum Suppression (NMS) is the process of selecting a single representative candidate within this cluster of detections, so as to obtain a unique detection per object appearing on a given picture. In this paper, we present a highly scalable NMS algorithm for embedded GPU architectures that is designed from scratch to handle workloads featuring thousands of simultaneous detections on a given picture. Our kernels are directly applicable to other sequential NMS algorithms such as FeatureNMS, Soft-NMS or AdaptiveNMS that share the inner workings of the classic greedy NMS method. The obtained performance results show that our parallel NMS algorithm is capable of clustering 1024 simultaneous detected objects per frame in roughly 1 ms on both NVIDIA Tegra X1 and NVIDIA Tegra X2 on-die GPUs, while taking 2 ms on NVIDIA Tegra K1. Furthermore, our proposed parallel greedy NMS algorithm yields a 14x-40x speed up when compared to state-of-the-art NMS methods that require learning a CNN from annotated data.
Related papers
- Accelerating Non-Maximum Suppression: A Graph Theory Perspective [24.34791528442417]
Non-maximum suppression (NMS) is an indispensable post-processing step in object detection.
This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure.
We introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods.
arXiv Detail & Related papers (2024-09-30T17:20:49Z) - Fast, nonlocal and neural: a lightweight high quality solution to image
denoising [19.306450225657414]
convolutional neural networks (CNNs) are now outperformed by model based denoising algorithms.
We propose a solution by combining a nonlocal algorithm with a lightweight residual CNN.
Our solution is between 10 and 20 times faster than CNNs with equivalent performance and attains higher PSNR.
arXiv Detail & Related papers (2024-03-06T06:12:56Z) - INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation [57.952478914459164]
kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference.
We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.
Experiments on four benchmark datasets show that method achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.
arXiv Detail & Related papers (2023-06-10T08:39:16Z) - Lightweight Salient Object Detection in Optical Remote-Sensing Images
via Semantic Matching and Edge Alignment [61.45639694373033]
We propose a novel lightweight network for optical remote sensing images (ORSI-SOD) based on semantic matching and edge alignment, termed SeaNet.
Specifically, SeaNet includes a lightweight MobileNet-V2 for feature extraction, a dynamic semantic matching module (DSMM) for high-level features, and a portable decoder for inference.
arXiv Detail & Related papers (2023-01-07T04:33:51Z) - EAutoDet: Efficient Architecture Search for Object Detection [110.99532343155073]
EAutoDet framework can discover practical backbone and FPN architectures for object detection in 1.4 GPU-days.
We propose a kernel reusing technique by sharing the weights of candidate operations on one edge and consolidating them into one convolution.
In particular, the discovered architectures surpass state-of-the-art object detection NAS methods and achieve 40.1 mAP with 120 FPS and 49.2 mAP with 41.3 FPS on COCO test-dev set.
arXiv Detail & Related papers (2022-03-21T05:56:12Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D
Object Detection [25.313894069303718]
We present and integrate GrooMeD-NMS -- a novel Grouped Mathematically Differentiable NMS for monocular 3D object detection.
GrooMeD-NMS addresses the mismatch between training and inference pipelines.
It achieves state-of-the-art monocular 3D object detection results on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-31T16:29:50Z) - Learning Versatile Neural Architectures by Propagating Network Codes [74.2450894473073]
We propose a novel "neural predictor", which is able to predict an architecture's performance in multiple datasets and tasks.
NCP learns from network codes but not original data, enabling it to update the architecture efficiently across datasets.
arXiv Detail & Related papers (2021-03-24T15:20:38Z) - Object Detection Made Simpler by Eliminating Heuristic NMS [70.93004137521946]
We show a simple NMS-free, end-to-end object detection framework.
We attain on par or even improved detection accuracy compared with the original one-stage detector.
arXiv Detail & Related papers (2021-01-28T02:38:29Z) - ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware
Priors [26.835571059909007]
Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines.
For the region proposal stage of two/multi-stage detectors, NMS is turning out to be a latency bottleneck due to its sequential nature.
We use ASAP-NMS to improve the latency of the NMS step from 13.6ms to 1.2 ms on a CPU without sacrificing the accuracy of a state-of-the-art two-stage detector.
arXiv Detail & Related papers (2020-07-19T21:15:48Z) - Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in
Crowded Traffic Scenes [7.998326245039892]
Modern 2D object detection frameworks predict multiple bounding boxes per object that are refined using Non-Maximum-Suppression (NMS) to suppress all but one bounding box.
Our novel Visibility Guided NMS (vg-NMS) leverages both pixel-based as well as amodal object detection paradigms and improves the detection performance especially for highly occluded objects with little computational overhead.
We evaluate vg-NMS using KITTI, VIPER as well as the Synscapes dataset and show that it outperforms current state-of-the-art NMS.
arXiv Detail & Related papers (2020-06-15T17:03:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.