A Computer Vision Based Approach for Stalking Detection Using a
CNN-LSTM-MLP Hybrid Fusion Model
- URL: http://arxiv.org/abs/2402.03417v1
- Date: Mon, 5 Feb 2024 18:53:54 GMT
- Title: A Computer Vision Based Approach for Stalking Detection Using a
CNN-LSTM-MLP Hybrid Fusion Model
- Authors: Murad Hasan, Shahriar Iqbal, Md. Billal Hossain Faisal, Md. Musnad
Hossin Neloy, Md. Tonmoy Kabir, Md. Tanzim Reza, Md. Golam Rabiul Alam, Md
Zia Uddin
- Abstract summary: Stalking in public places has become a common occurrence with women being the most affected.
It has become a necessity to detect stalking as all of these criminal activities can be stopped through stalking detection.
In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video.
- Score: 1.0691590188849427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Criminal and suspicious activity detection has become a popular research
topic in recent years. The rapid growth of computer vision technologies has had
a crucial impact on solving this issue. However, physical stalking detection is
still a less explored area despite the evolution of modern technology.
Nowadays, stalking in public places has become a common occurrence with women
being the most affected. Stalking is a visible action that usually occurs
before any criminal activity begins as the stalker begins to follow, loiter,
and stare at the victim before committing any criminal activity such as
assault, kidnapping, rape, and so on. Therefore, it has become a necessity to
detect stalking as all of these criminal activities can be stopped in the first
place through stalking detection. In this research, we propose a novel deep
learning-based hybrid fusion model to detect potential stalkers from a single
video with a minimal number of frames. We extract multiple relevant features,
such as facial landmarks, head pose estimation, and relative distance, as
numerical values from video frames. This data is fed into a multilayer
perceptron (MLP) to perform a classification task between a stalking and a
non-stalking scenario. Simultaneously, the video frames are fed into a
combination of convolutional and LSTM models to extract the spatio-temporal
features. We use a fusion of these numerical and spatio-temporal features to
build a classifier to detect stalking incidents. Additionally, we introduce a
dataset consisting of stalking and non-stalking videos gathered from various
feature films and television series, which is also used to train the model. The
experimental results show the efficiency and dynamism of our proposed stalker
detection system, achieving 89.58% testing accuracy with a significant
improvement as compared to the state-of-the-art approaches.
Related papers
- Model Inversion Attacks: A Survey of Approaches and Countermeasures [59.986922963781]
Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training.
Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs.
This survey aims to summarize up-to-date MIA methods in both attacks and defenses.
arXiv Detail & Related papers (2024-11-15T08:09:28Z) - JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos [4.94659999696881]
Violence detection in surveillance videos presents additional issues, such as the wide variety of real fight scenes.
We introduce JOSENet, a self-supervised framework that provides outstanding performance for violence detection in surveillance videos.
arXiv Detail & Related papers (2024-05-05T15:01:00Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime.
This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Detecting Invisible People [58.49425715635312]
We re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects.
We demonstrate that current detection and tracking systems perform dramatically worse on this task.
Second, we build dynamic models that explicitly reason in 3D, making use of observations produced by state-of-the-art monocular depth estimation networks.
arXiv Detail & Related papers (2020-12-15T16:54:45Z) - TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training
Model [51.14840210957289]
Multi-object tracking is a fundamental vision problem that has been studied for a long time.
Despite the success of Tracking by Detection (TBD), this two-step method is too complicated to train in an end-to-end manner.
We propose a concise end-to-end model TubeTK which only needs one step training by introducing the bounding-tube" to indicate temporal-spatial locations of objects in a short video clip.
arXiv Detail & Related papers (2020-06-10T06:45:05Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.