PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
- URL: http://arxiv.org/abs/2105.12990v1
- Date: Thu, 27 May 2021 08:24:21 GMT
- Title: PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
- Authors: Tianyi Zhang, Jie Lin, Peng Hu, Bin Zhao, Mohamed M. Sabry Aly
- Abstract summary: Non-maximum Suppression (NMS) is an essential postprocessing step in modern convolutional neural networks for object detection.
The de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized.
MaxpoolNMS is introduced as a parallelizable alternative to GreedyNMS.
- Score: 17.704037442897004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-maximum Suppression (NMS) is an essential postprocessing step in modern
convolutional neural networks for object detection. Unlike convolutions which
are inherently parallel, the de-facto standard for NMS, namely GreedyNMS,
cannot be easily parallelized and thus could be the performance bottleneck in
convolutional object detection pipelines. MaxpoolNMS is introduced as a
parallelizable alternative to GreedyNMS, which in turn enables faster speed
than GreedyNMS at comparable accuracy. However, MaxpoolNMS is only capable of
replacing the GreedyNMS at the first stage of two-stage detectors like
Faster-RCNN. There is a significant drop in accuracy when applying MaxpoolNMS
at the final detection stage, due to the fact that MaxpoolNMS fails to
approximate GreedyNMS precisely in terms of bounding box selection. In this
paper, we propose a general, parallelizable and configurable approach
PSRR-MaxpoolNMS, to completely replace GreedyNMS at all stages in all
detectors. By introducing a simple Relationship Recovery module and a Pyramid
Shifted MaxpoolNMS module, our PSRR-MaxpoolNMS is able to approximate GreedyNMS
more precisely than MaxpoolNMS. Comprehensive experiments show that our
approach outperforms MaxpoolNMS by a large margin, and it is proven faster than
GreedyNMS with comparable accuracy. For the first time, PSRR-MaxpoolNMS
provides a fully parallelizable solution for customized hardware design, which
can be reused for accelerating NMS everywhere.
Related papers
- Accelerating Non-Maximum Suppression: A Graph Theory Perspective [24.34791528442417]
Non-maximum suppression (NMS) is an indispensable post-processing step in object detection.
This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure.
We introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods.
arXiv Detail & Related papers (2024-09-30T17:20:49Z) - MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs [9.719789698194154]
Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints.
However, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA.
In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs.
arXiv Detail & Related papers (2024-07-17T14:51:15Z) - Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - ANMS: Asynchronous Non-Maximum Suppression in Event Stream [15.355579943905585]
Non-maximum suppression (NMS) is widely used in frame-based tasks as an essential post-processing algorithm.
This paper proposes a general-purpose asynchronous non-maximum suppression pipeline (ANMS)
The proposed pipeline extract fine feature stream from the output of original detectors and adapts to the speed of motion.
arXiv Detail & Related papers (2023-03-19T05:33:32Z) - SymNMF-Net for The Symmetric NMF Problem [62.44067422984995]
We propose a neural network called SymNMF-Net for the Symmetric NMF problem.
We show that the inference of each block corresponds to a single iteration of the optimization.
Empirical results on real-world datasets demonstrate the superiority of our SymNMF-Net.
arXiv Detail & Related papers (2022-05-26T08:17:39Z) - ISDA: Position-Aware Instance Segmentation with Deformable Attention [4.188555841288538]
We propose a novel end-to-end instance segmentation method termed ISDA.
It reshapes the task into predicting a set of object masks, which are generated via traditional convolution operation.
Thanks to the introduced set-prediction mechanism, the proposed method is NMS-free.
arXiv Detail & Related papers (2022-02-23T12:30:18Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z) - ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware
Priors [26.835571059909007]
Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines.
For the region proposal stage of two/multi-stage detectors, NMS is turning out to be a latency bottleneck due to its sequential nature.
We use ASAP-NMS to improve the latency of the NMS step from 13.6ms to 1.2 ms on a CPU without sacrificing the accuracy of a state-of-the-art two-stage detector.
arXiv Detail & Related papers (2020-07-19T21:15:48Z) - BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted
Regularization Method [69.49386965992464]
We propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method.
Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds ofintensive computation layers.
It is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
arXiv Detail & Related papers (2020-01-23T03:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.