Related papers: ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos

ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos

URL: http://arxiv.org/abs/2208.07929v1
Date: Tue, 16 Aug 2022 20:03:53 GMT
Title: ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos
Authors: James Wensel, Hayat Ullah, Arslan Munir, Erik Blasch
Abstract summary: This paper proposes and designs two transformer neural networks for human activity recognition. A recurrent transformer (ReT) is a specialized neural network used to make predictions on sequences of data, and a vision transformer (ViT) is a vision transformer optimized for extracting salient features from images. We have provided an extensive comparison of the proposed transformer neural networks with the contemporary CNN and RNN-based human activity recognition models in terms of speed and accuracy.
Score: 6.117917355232902
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human activity recognition is an emerging and important area in computer vision which seeks to determine the activity an individual or group of individuals are performing. The applications of this field ranges from generating highlight videos in sports, to intelligent surveillance and gesture recognition. Most activity recognition systems rely on a combination of convolutional neural networks (CNNs) to perform feature extraction from the data and recurrent neural networks (RNNs) to determine the time dependent nature of the data. This paper proposes and designs two transformer neural networks for human activity recognition: a recurrent transformer (ReT), a specialized neural network used to make predictions on sequences of data, as well as a vision transformer (ViT), a transformer optimized for extracting salient features from images, to improve speed and scalability of activity recognition. We have provided an extensive comparison of the proposed transformer neural networks with the contemporary CNN and RNN-based human activity recognition models in terms of speed and accuracy.

Related papers

Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Design and development of opto-neural processors for simulation of neural networks trained in image detection for potential implementation in hybrid robotics [0.0]
Living neural networks offer advantages of lower power consumption, faster processing, and biological realism. This work proposes a simulated living neural network trained indirectly by backpropagating STDP based algorithms using precision activation by optogenetics.
arXiv Detail & Related papers (2024-01-17T04:42:49Z)
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition [3.6321891270689055]
We propose a novel approach that leverages the strengths of both CNNs and Transformers in a hybrid architecture for performing activity recognition using RGB videos. Our architecture has achieved new SOTA results with 90.05 %, 99.6%, and 95.09% on HMDB51, UCF101, and ETRI-Activity3D respectively.
arXiv Detail & Related papers (2023-10-22T21:13:43Z)
Training Robust Spiking Neural Networks with ViewPoint Transform and SpatioTemporal Stretching [4.736525128377909]
We propose a novel data augmentation method, ViewPoint Transform and Spatio Stretching (VPT-STS) It improves the robustness of spiking neural networks by transforming the rotation centers and angles in thetemporal domain to generate samples from different viewpoints. Experiments on prevailing neuromorphic datasets demonstrate that VPT-STS is broadly effective on multi-event representations and significantly outperforms pure spatial geometric transformations.
arXiv Detail & Related papers (2023-03-14T03:09:56Z)
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer [1.876462046907555]
We propose a novel PSO-ConvNet model for learning actions in videos. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy. Overall, our dynamic PSO-ConvNet model provides a promising direction for improving Human Action Recognition.
arXiv Detail & Related papers (2023-02-17T23:39:34Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Event-based Video Reconstruction via Potential-assisted Spiking Neural Network [48.88510552931186]
Bio-inspired neural networks can potentially lead to greater computational efficiency on event-driven hardware. We propose a novel Event-based Video reconstruction framework based on a fully Spiking Neural Network (EVSNN) We find that the spiking neurons have the potential to store useful temporal information (memory) to complete such time-dependent tasks.
arXiv Detail & Related papers (2022-01-25T02:05:20Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP [0.0]
It is important to study the behavior of SNNs trained with unsupervised learning methods on video classification tasks. This paper presents methods of transposing temporal information into a static format, and then transforming the visual information into spikes using latency coding. We show the effect of the similarity in the shape and speed of certain actions on action recognition with spiking neural networks.
arXiv Detail & Related papers (2021-05-31T07:07:48Z)
Neuroevolution of a Recurrent Neural Network for Spatial and Working Memory in a Simulated Robotic Environment [57.91534223695695]
We evolved weights in a biologically plausible recurrent neural network (RNN) using an evolutionary algorithm to replicate the behavior and neural activity observed in rats. Our method demonstrates how the dynamic activity in evolved RNNs can capture interesting and complex cognitive behavior.
arXiv Detail & Related papers (2021-02-25T02:13:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.