Related papers: PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

URL: http://arxiv.org/abs/2105.12990v1
Date: Thu, 27 May 2021 08:24:21 GMT
Title: PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
Authors: Tianyi Zhang, Jie Lin, Peng Hu, Bin Zhao, Mohamed M. Sabry Aly
Abstract summary: Non-maximum Suppression (NMS) is an essential postprocessing step in modern convolutional neural networks for object detection. The de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized. MaxpoolNMS is introduced as a parallelizable alternative to GreedyNMS.
Score: 17.704037442897004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-maximum Suppression (NMS) is an essential postprocessing step in modern convolutional neural networks for object detection. Unlike convolutions which are inherently parallel, the de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized and thus could be the performance bottleneck in convolutional object detection pipelines. MaxpoolNMS is introduced as a parallelizable alternative to GreedyNMS, which in turn enables faster speed than GreedyNMS at comparable accuracy. However, MaxpoolNMS is only capable of replacing the GreedyNMS at the first stage of two-stage detectors like Faster-RCNN. There is a significant drop in accuracy when applying MaxpoolNMS at the final detection stage, due to the fact that MaxpoolNMS fails to approximate GreedyNMS precisely in terms of bounding box selection. In this paper, we propose a general, parallelizable and configurable approach PSRR-MaxpoolNMS, to completely replace GreedyNMS at all stages in all detectors. By introducing a simple Relationship Recovery module and a Pyramid Shifted MaxpoolNMS module, our PSRR-MaxpoolNMS is able to approximate GreedyNMS more precisely than MaxpoolNMS. Comprehensive experiments show that our approach outperforms MaxpoolNMS by a large margin, and it is proven faster than GreedyNMS with comparable accuracy. For the first time, PSRR-MaxpoolNMS provides a fully parallelizable solution for customized hardware design, which can be reused for accelerating NMS everywhere.

Related papers

Neural Estimation for Scaling Entropic Multimarginal Optimal Transport [14.389645696715599]
We propose a new computational framework for entropic MOT, dubbed Neural Entropic MOT (NEMOT)<n>NEMOT employs neural networks trained using mini-batches, which transfers the computational complexity from the dataset size to the size of the mini-batch, leading to substantial gains.<n>In particular, orders-of-magnitude speedups are observed relative to the state-of-the-art, with a notable increase in the feasible number of samples and marginals.
arXiv Detail & Related papers (2025-05-31T14:10:27Z)
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks [12.33715367032615]
Network pruning reduces the computational requirements of large neural networks.<n>N:M sparsity retains only N out of every M consecutive weights.<n>Transposable N:M sparsity has been proposed to address this limitation.
arXiv Detail & Related papers (2025-05-29T18:59:43Z)
Work-Efficient Parallel Non-Maximum Suppression Kernels [8.872464006522929]
Non-Maximum Suppression (NMS) is the process of selecting a single representative candidate within a cluster of detections. We present a highly scalable NMS algorithm for embedded GPU architectures that is designed from scratch to handle workloads featuring thousands of simultaneous detections. Our proposed parallel greedy NMS algorithm yields a 14x-40x speed up when compared to state-of-the-art NMS methods.
arXiv Detail & Related papers (2025-02-01T19:23:13Z)
Accelerating Non-Maximum Suppression: A Graph Theory Perspective [24.34791528442417]
Non-maximum suppression (NMS) is an indispensable post-processing step in object detection. This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure. We introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods.
arXiv Detail & Related papers (2024-09-30T17:20:49Z)
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs [9.719789698194154]
Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints. However, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA. In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs.
arXiv Detail & Related papers (2024-07-17T14:51:15Z)
Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times. We introduce a new parallel method Freya to handle arbitrarily slow computations. We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z)
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z)
ANMS: Asynchronous Non-Maximum Suppression in Event Stream [15.355579943905585]
Non-maximum suppression (NMS) is widely used in frame-based tasks as an essential post-processing algorithm. This paper proposes a general-purpose asynchronous non-maximum suppression pipeline (ANMS) The proposed pipeline extract fine feature stream from the output of original detectors and adapts to the speed of motion.
arXiv Detail & Related papers (2023-03-19T05:33:32Z)
SymNMF-Net for The Symmetric NMF Problem [62.44067422984995]
We propose a neural network called SymNMF-Net for the Symmetric NMF problem. We show that the inference of each block corresponds to a single iteration of the optimization. Empirical results on real-world datasets demonstrate the superiority of our SymNMF-Net.
arXiv Detail & Related papers (2022-05-26T08:17:39Z)
ISDA: Position-Aware Instance Segmentation with Deformable Attention [4.188555841288538]
We propose a novel end-to-end instance segmentation method termed ISDA. It reshapes the task into predicting a set of object masks, which are generated via traditional convolution operation. Thanks to the introduced set-prediction mechanism, the proposed method is NMS-free.
arXiv Detail & Related papers (2022-02-23T12:30:18Z)
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z)
End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection. A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region. Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z)
ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors [26.835571059909007]
Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines. For the region proposal stage of two/multi-stage detectors, NMS is turning out to be a latency bottleneck due to its sequential nature. We use ASAP-NMS to improve the latency of the NMS step from 13.6ms to 1.2 ms on a CPU without sacrificing the accuracy of a state-of-the-art two-stage detector.
arXiv Detail & Related papers (2020-07-19T21:15:48Z)
BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method [69.49386965992464]
We propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds ofintensive computation layers. It is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
arXiv Detail & Related papers (2020-01-23T03:30:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.