PIDNet: An Efficient Network for Dynamic Pedestrian Intrusion Detection
- URL: http://arxiv.org/abs/2009.00312v1
- Date: Tue, 1 Sep 2020 09:34:43 GMT
- Title: PIDNet: An Efficient Network for Dynamic Pedestrian Intrusion Detection
- Authors: Jingchen Sun, Jiming Chen, Tao Chen, Jiayuan Fan, Shibo He
- Abstract summary: Vision-based dynamic pedestrian intrusion detection (PID), judging whether pedestrians intrude an area-of-interest (AoI) by a moving camera, is an important task in mobile surveillance.
We propose a novel and efficient multi-task deep neural network, PIDNet, to solve this problem.
PIDNet is mainly designed by considering two factors: accurately segmenting the dynamically changing AoIs from a video frame captured by the moving camera and quickly detecting pedestrians from the generated AoI-contained areas.
- Score: 22.316826418265666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based dynamic pedestrian intrusion detection (PID), judging whether
pedestrians intrude an area-of-interest (AoI) by a moving camera, is an
important task in mobile surveillance. The dynamically changing AoIs and a
number of pedestrians in video frames increase the difficulty and computational
complexity of determining whether pedestrians intrude the AoI, which makes
previous algorithms incapable of this task. In this paper, we propose a novel
and efficient multi-task deep neural network, PIDNet, to solve this problem.
PIDNet is mainly designed by considering two factors: accurately segmenting the
dynamically changing AoIs from a video frame captured by the moving camera and
quickly detecting pedestrians from the generated AoI-contained areas. Three
efficient network designs are proposed and incorporated into PIDNet to reduce
the computational complexity: 1) a special PID task backbone for feature
sharing, 2) a feature cropping module for feature cropping, and 3) a lighter
detection branch network for feature compression. In addition, considering
there are no public datasets and benchmarks in this field, we establish a
benchmark dataset to evaluate the proposed network and give the corresponding
evaluation metrics for the first time. Experimental results show that PIDNet
can achieve 67.1% PID accuracy and 9.6 fps inference speed on the proposed
dataset, which serves as a good baseline for the future vision-based dynamic
PID study.
Related papers
- A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera
Based Activity Recognition [2.705905918316948]
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years.
We propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention.
The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets.
arXiv Detail & Related papers (2022-12-07T00:33:40Z) - Using Detection, Tracking and Prediction in Visual SLAM to Achieve
Real-time Semantic Mapping of Dynamic Scenarios [70.70421502784598]
RDS-SLAM can build semantic maps at object level for dynamic scenarios in real time using only one commonly used Intel Core i7 CPU.
We evaluate RDS-SLAM in TUM RGB-D dataset, and experimental results show that RDS-SLAM can run with 30.3 ms per frame in dynamic scenarios.
arXiv Detail & Related papers (2022-10-10T11:03:32Z) - PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection
from Point Cloud [64.12626752721766]
We present PiFeNet, an efficient real-time 3D detector for pedestrian detection from point clouds.
We address two challenges that 3D object detection frameworks encounter when detecting pedestrians: low of pillar features and small occupation areas of pedestrians in point clouds.
Our approach is ranked 1st in KITTI pedestrian BEV and 3D leaderboards while running at 26 frames per second (FPS), and achieves state-of-the-art performance on Nuscenes detection benchmark.
arXiv Detail & Related papers (2021-12-31T13:41:37Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - Sequential End-to-end Network for Efficient Person Search [7.3658840620058115]
Person search aims at jointly solving Person Detection and Person Re-identification (re-ID)
Existing works have designed end-to-end networks based on Faster R-CNN.
We propose a Sequential End-to-end Network (SeqNet) to extract superior features.
arXiv Detail & Related papers (2021-03-18T10:28:24Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.