Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems
- URL: http://arxiv.org/abs/2111.02064v1
- Date: Wed, 3 Nov 2021 08:30:38 GMT
- Title: Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems
- Authors: Swarnabja Bhaumik, Prithwish Jana and Partha Pratim Mohanta
- Abstract summary: Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This chapter aims to aid the development of Cyber-Physical Systems (CPS) in
automated understanding of events and activities in various applications of
video-surveillance. These events are mostly captured by drones, CCTVs or novice
and unskilled individuals on low-end devices. Being unconstrained, these videos
are immensely challenging due to a number of quality factors. We present an
extensive account of the various approaches taken to solve the problem over the
years. This ranges from methods as early as Structure from Motion (SFM) based
approaches to recent solution frameworks involving deep neural networks. We
show that the long-term motion patterns alone play a pivotal role in the task
of recognizing an event. Consequently each video is significantly represented
by a fixed number of key-frames using a graph-based approach. Only the temporal
features are exploited using a hybrid Convolutional Neural Network (CNN) +
Recurrent Neural Network (RNN) architecture. The results we obtain are
encouraging as they outperform standard temporal CNNs and are at par with those
using spatial information along with motion cues. Further exploring multistream
models, we conceive a multi-tier fusion strategy for the spatial and temporal
wings of a network. A consolidated representation of the respective individual
prediction vectors on video and frame levels is obtained using a biased
conflation technique. The fusion strategy endows us with greater rise in
precision on each stage as compared to the state-of-the-art methods, and thus a
powerful consensus is achieved in classification. Results are recorded on four
benchmark datasets widely used in the domain of action recognition, namely CCV,
HMDB, UCF-101 and KCV. It is inferable that focusing on better classification
of the video sequences certainly leads to robust actuation of a system designed
for event surveillance and object cum activity tracking.
Related papers
- Hypergraph-based Multi-View Action Recognition using Event Cameras [20.965606424362726]
We introduce HyperMV, a multi-view event-based action recognition framework.
We present the largest multi-view event-based action dataset $textTHUtextMV-EACTtext-50$, comprising 50 actions from 6 viewpoints.
Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios.
arXiv Detail & Related papers (2024-03-28T11:17:00Z) - Co-attention Propagation Network for Zero-Shot Video Object Segmentation [91.71692262860323]
Zero-shot object segmentation (ZS-VOS) aims to segment objects in a video sequence without prior knowledge of these objects.
Existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios.
We propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects.
arXiv Detail & Related papers (2023-04-08T04:45:48Z) - Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal
Features Using Time Distributed Deep CNNs, RNNs and Attention-Based
Mechanisms [0.0]
Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety.
Deep learning video classification techniques can help us automate surveillance systems to detect violence as it happens.
arXiv Detail & Related papers (2023-02-21T22:02:39Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Unsupervised Video Summarization with a Convolutional Attentive
Adversarial Network [32.90753137435032]
We propose a convolutional attentive adversarial network (CAAN) to build a deep summarizer in an unsupervised way.
Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.
The results show the superiority of our proposed method against other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-24T07:24:39Z) - Dense Interaction Learning for Video-based Person Re-identification [75.03200492219003]
We propose a hybrid framework, Dense Interaction Learning (DenseIL), to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder.
Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.
arXiv Detail & Related papers (2021-03-16T12:22:08Z) - Complex Human Action Recognition in Live Videos Using Hybrid FR-DL
Method [1.027974860479791]
We address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences.
We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method.
We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short.
arXiv Detail & Related papers (2020-07-06T15:12:50Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Hierarchical Attention Network for Action Segmentation [45.19890687786009]
The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video.
We propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time.
We evaluate our system on challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets.
arXiv Detail & Related papers (2020-05-07T02:39:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.