Precise Single-stage Detector
- URL: http://arxiv.org/abs/2210.04252v1
- Date: Sun, 9 Oct 2022 12:58:37 GMT
- Title: Precise Single-stage Detector
- Authors: Aisha Chandio, Gong Gui, Teerath Kumar, Irfan Ullah, Ramin
Ranjbarzadeh, Arunabha M Roy, Akhtar Hussain, and Yao Shen
- Abstract summary: We propose a modified version of Single Shot Multibox Detector (SSD) named Precise Single Stage Detector (PSSD)
In order to address these aforementioned issues, we propose a new architecture, named Precise Single Stage Detector (PSSD)
- Score: 2.2719729705587155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are still two problems in SDD causing some inaccurate results: (1) In
the process of feature extraction, with the layer-by-layer acquisition of
semantic information, local information is gradually lost, resulting into less
representative feature maps; (2) During the Non-Maximum Suppression (NMS)
algorithm due to inconsistency in classification and regression tasks, the
classification confidence and predicted detection position cannot accurately
indicate the position of the prediction boxes. Methods: In order to address
these aforementioned issues, we propose a new architecture, a modified version
of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector
(PSSD). Firstly, we improve the features by adding extra layers to SSD.
Secondly, we construct a simple and effective feature enhancement module to
expand the receptive field step by step for each layer and enhance its local
and semantic information. Finally, we design a more efficient loss function to
predict the IOU between the prediction boxes and ground truth boxes, and the
threshold IOU guides classification training and attenuates the scores, which
are used by the NMS algorithm. Main Results: Benefiting from the above
optimization, the proposed model PSSD achieves exciting performance in
real-time. Specifically, with the hardware of Titan Xp and the input size of
320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28
mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object
detection models. Besides, the proposed model performs significantly well with
larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS
COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove
that the proposed model has a better trade-off between speed and accuracy.
Related papers
- Dynamic layer selection in decoder-only transformers [21.18795712840146]
We empirically examine two common dynamic inference methods for natural language generation.
We find that a pre-trained decoder-only model is significantly more robust to layer removal via layer skipping.
We also show that dynamic computation allocation on a per-sequence basis holds promise for significant efficiency gains.
arXiv Detail & Related papers (2024-10-26T00:44:11Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose
Estimation [78.83305967085413]
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task.
Our method outperforms state-of-the-art methods with fewer parameters and less computational overhead.
arXiv Detail & Related papers (2022-03-15T04:00:59Z) - SPDY: Accurate Pruning with Speedup Guarantees [29.284147465251685]
SPDY is a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup.
We show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios.
We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern.
arXiv Detail & Related papers (2022-01-31T10:14:31Z) - Detecting Small Objects in Thermal Images Using Single-Shot Detector [12.72157936831052]
SSD (Single Shot Multibox Detector) is one of the most successful object detectors for its high accuracy and fast speed.
In this paper, we proposed an enhanced SSD with a novel feature fusion module which can improve the performance over SSD for small object detection.
arXiv Detail & Related papers (2021-08-25T07:54:36Z) - Small Object Detection Based on Modified FSSD and Model Compression [7.387639662781843]
This paper proposes a small object detection algorithm based on FSSD.
In order to reduce the computational cost and storage space, pruning is carried out to achieve model compression.
The average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti.
arXiv Detail & Related papers (2021-08-24T03:20:32Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - SAR-U-Net: squeeze-and-excitation block and atrous spatial pyramid
pooling based residual U-Net for automatic liver CT segmentation [3.192503074844775]
A modified U-Net based framework is presented, which leverages techniques from Squeeze-and-Excitation (SE) block, Atrous Spatial Pyramid Pooling (ASPP) and residual learning.
The effectiveness of the proposed method was tested on two public datasets LiTS17 and SLiver07.
arXiv Detail & Related papers (2021-03-11T02:32:59Z) - SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector.
It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection.
Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z) - 3DSSD: Point-based 3D Single Stage Object Detector [61.67928229961813]
We present a point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency.
Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well.
arXiv Detail & Related papers (2020-02-24T12:01:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.