DIFEM: Key-points Interaction based Feature Extraction Module for Violence Recognition in Videos
- URL: http://arxiv.org/abs/2412.05386v1
- Date: Fri, 06 Dec 2024 19:25:47 GMT
- Title: DIFEM: Key-points Interaction based Feature Extraction Module for Violence Recognition in Videos
- Authors: Himanshu Mittal, Suvramalya Basak, Anjali Gautam,
- Abstract summary: We propose an effective method which leverages human skeleton key-points to capture inherent properties of violence.
At the heart of our method is our novel Dynamic Interaction Feature Extraction Module (DIFEM) which captures features such as velocity, and joint intersections.
With the features extracted by our DIFEM, we use various classification algorithms such as Random Forest, Decision tree, AdaBoost and k-Nearest Neighbor.
- Score: 6.375350222633163
- License:
- Abstract: Violence detection in surveillance videos is a critical task for ensuring public safety. As a result, there is increasing need for efficient and lightweight systems for automatic detection of violent behaviours. In this work, we propose an effective method which leverages human skeleton key-points to capture inherent properties of violence, such as rapid movement of specific joints and their close proximity. At the heart of our method is our novel Dynamic Interaction Feature Extraction Module (DIFEM) which captures features such as velocity, and joint intersections, effectively capturing the dynamics of violent behavior. With the features extracted by our DIFEM, we use various classification algorithms such as Random Forest, Decision tree, AdaBoost and k-Nearest Neighbor. Our approach has substantially lesser amount of parameter expense than the existing state-of-the-art (SOTA) methods employing deep learning techniques. We perform extensive experiments on three standard violence recognition datasets, showing promising performance in all three datasets. Our proposed method surpasses several SOTA violence recognition methods.
Related papers
- Streamlining Video Analysis for Efficient Violence Detection [1.444946491007292]
This paper addresses the challenge of automated violence detection in video frames captured by surveillance cameras.
We propose an approach using a 3D Convolutional Neural Network (3D CNN)-based model named X3D to tackle this problem.
arXiv Detail & Related papers (2024-11-29T06:32:36Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Improving Video Violence Recognition with Human Interaction Learning on
3D Skeleton Point Clouds [88.87985219999764]
We develop a method for video violence recognition from a new perspective of skeleton points.
We first formulate 3D skeleton point clouds from human sequences extracted from videos.
We then perform interaction learning on these 3D skeleton point clouds.
arXiv Detail & Related papers (2023-08-26T12:55:18Z) - Towards Efficient and Domain-Agnostic Evasion Attack with
High-dimensional Categorical Inputs [33.36532022853583]
Our work targets at searching feasible adversarial to attack a perturbation with high-dimensional categorical inputs in a domain-agnostic setting.
Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming.
Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
arXiv Detail & Related papers (2022-12-13T18:45:00Z) - Detecting Violence in Video Based on Deep Features Fusion Technique [0.30458514384586394]
This work proposed a novel method to detect violence using a fusion tech-nique of two convolutional neural networks (CNNs)
The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy.
arXiv Detail & Related papers (2022-04-15T12:51:20Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Group-Skeleton-Based Human Action Recognition in Complex Events [15.649778891665468]
We propose a novel group-skeleton-based human action recognition method in complex events.
This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons.
Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-11-26T13:19:14Z) - Towards Understanding the Adversarial Vulnerability of Skeleton-based
Action Recognition [133.35968094967626]
Skeleton-based action recognition has attracted increasing attention due to its strong adaptability to dynamic circumstances.
With the help of deep learning techniques, it has also witnessed substantial progress and currently achieved around 90% accuracy in benign environment.
Research on the vulnerability of skeleton-based action recognition under different adversarial settings remains scant.
arXiv Detail & Related papers (2020-05-14T17:12:52Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.