Complex Human Action Recognition in Live Videos Using Hybrid FR-DL
Method
- URL: http://arxiv.org/abs/2007.02811v1
- Date: Mon, 6 Jul 2020 15:12:50 GMT
- Title: Complex Human Action Recognition in Live Videos Using Hybrid FR-DL
Method
- Authors: Fatemeh Serpush, Mahdi Rezaei
- Abstract summary: We address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences.
We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method.
We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short.
- Score: 1.027974860479791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated human action recognition is one of the most attractive and
practical research fields in computer vision, in spite of its high
computational costs. In such systems, the human action labelling is based on
the appearance and patterns of the motions in the video sequences; however, the
conventional methodologies and classic neural networks cannot use temporal
information for action recognition prediction in the upcoming frames in a video
sequence. On the other hand, the computational cost of the preprocessing stage
is high. In this paper, we address challenges of the preprocessing phase, by an
automated selection of representative frames among the input sequences.
Furthermore, we extract the key features of the representative frame rather
than the entire features. We propose a hybrid technique using background
subtraction and HOG, followed by application of a deep neural network and
skeletal modelling method. The combination of a CNN and the LSTM recursive
network is considered for feature selection and maintaining the previous
information, and finally, a Softmax-KNN classifier is used for labelling human
activities. We name our model as Feature Reduction & Deep Learning based action
recognition method, or FR-DL in short. To evaluate the proposed method, we use
the UCF dataset for the benchmarking which is widely-used among researchers in
action recognition research. The dataset includes 101 complicated activities in
the wild. Experimental results show a significant improvement in terms of
accuracy and speed in comparison with six state-of-the-art articles.
Related papers
- Pre-training for Action Recognition with Automatically Generated Fractal Datasets [23.686476742398973]
We present methods to automatically produce large-scale datasets of short synthetic video clips.
The generated video clips are characterized by notable variety, stemmed by the innate ability of fractals to generate complex multi-scale structures.
Compared to standard Kinetics pre-training, our reported results come close and are even superior on a portion of downstream datasets.
arXiv Detail & Related papers (2024-11-26T16:51:11Z) - Human activity recognition using deep learning approaches and single
frame cnn and convolutional lstm [0.0]
We explore two deep learning-based approaches, namely single frame Convolutional Neural Networks (CNNs) and convolutional Long Short-Term Memory to recognise human actions from videos.
The two models were trained and evaluated on a benchmark action recognition dataset, UCF50, and another dataset that was created for the experimentation.
Though both models exhibit good accuracies, the single frame CNN model outperforms the Convolutional LSTM model by having an accuracy of 99.8% with the UCF50 dataset.
arXiv Detail & Related papers (2023-04-18T01:33:29Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Concurrent Neural Tree and Data Preprocessing AutoML for Image
Classification [0.5735035463793008]
Current state-of-the-art (SOTA) methods do not include traditional methods for manipulating input data as part of the algorithmic search space.
We adapt the Evolutionary Multi-objective Algorithm Design Engine (EMADE), a multi-objective evolutionary search framework for traditional machine learning methods, to perform neural architecture search.
We show that including these methods as part of the search space shows potential to provide benefits to performance on the CIFAR-10 image classification benchmark dataset.
arXiv Detail & Related papers (2022-05-25T20:03:09Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.