Real-time Action Recognition for Fine-Grained Actions and The Hand Wash
Dataset
- URL: http://arxiv.org/abs/2210.07400v1
- Date: Thu, 13 Oct 2022 22:38:11 GMT
- Title: Real-time Action Recognition for Fine-Grained Actions and The Hand Wash
Dataset
- Authors: Akash Nagaraj, Mukund Sood, Chetna Sureka, Gowri Srinivasa
- Abstract summary: A three-stream fusion algorithm is proposed, which runs both accurately and efficiently in real-time on low-powered systems such as a Raspberry Pi.
The results achieved by this algorithm are benchmarked on the UCF-101 and the HMDB-51 datasets, achieving an accuracy of 92.7% and 64.9% respectively.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present a three-stream algorithm for real-time action
recognition and a new dataset of handwash videos, with the intent of aligning
action recognition with real-world constraints to yield effective conclusions.
A three-stream fusion algorithm is proposed, which runs both accurately and
efficiently, in real-time even on low-powered systems such as a Raspberry Pi.
The cornerstone of the proposed algorithm is the incorporation of both spatial
and temporal information, as well as the information of the objects in a video
while using an efficient architecture, and Optical Flow computation to achieve
commendable results in real-time. The results achieved by this algorithm are
benchmarked on the UCF-101 as well as the HMDB-51 datasets, achieving an
accuracy of 92.7% and 64.9% respectively. An important point to note is that
the algorithm is novel in the aspect that it is also able to learn the
intricate differences between extremely similar actions, which would be
difficult even for the human eye. Additionally, noticing a dearth in the number
of datasets for the recognition of very similar or fine-grained actions, this
paper also introduces a new dataset that is made publicly available, the Hand
Wash Dataset with the intent of introducing a new benchmark for fine-grained
action recognition tasks in the future.
Related papers
- On the Importance of Spatial Relations for Few-shot Action Recognition [109.2312001355221]
In this paper, we investigate the importance of spatial relations and propose a more accurate few-shot action recognition method.
A novel Spatial Alignment Cross Transformer (SA-CT) learns to re-adjust the spatial relations and incorporates the temporal information.
Experiments reveal that, even without using any temporal information, the performance of SA-CT is comparable to temporal based methods on 3/4 benchmarks.
arXiv Detail & Related papers (2023-08-14T12:58:02Z) - High Speed Human Action Recognition using a Photonic Reservoir Computer [1.7403133838762443]
We introduce a new training method for the reservoir computer, based on "Timesteps Of Interest"
We solve the task with high accuracy and speed, to the point of allowing for processing multiple video streams in real time.
arXiv Detail & Related papers (2023-05-24T16:04:42Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - SHREC 2022 Track on Online Detection of Heterogeneous Gestures [11.447098172408111]
This paper presents the outcomes of a contest organized to evaluate methods for the online recognition of heterogeneous gestures from sequences of 3D hand poses.
The dataset features continuous sequences of hand tracking data where the gestures are interleaved with non-significant motions.
The evaluation is based not only on the detection performances but also on the latency and the false positives, making it possible to understand the feasibility of practical interaction tools.
arXiv Detail & Related papers (2022-07-14T07:24:02Z) - Class-Incremental Learning for Action Recognition in Videos [44.923719189467164]
We tackle catastrophic forgetting problem in the context of class-incremental learning for video recognition.
Our framework addresses this challenging task by introducing time-channel importance maps and exploiting the importance maps for learning the representations of incoming examples.
We evaluate the proposed approach on brand-new splits of class-incremental action recognition benchmarks constructed upon the UCF101, HMDB51, and Something-Something V2 datasets.
arXiv Detail & Related papers (2022-03-25T12:15:49Z) - 6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques
for a Fast and Accurate Object Grasping [0.19686770963118383]
Real-time robotic grasping is a priority target for highly advanced autonomous systems.
This paper proposes a novel method with a 2-stage approach that combines a fast 2D object recognition using a deep neural network.
The proposed solution has a potential to perform robustly on real-time applications, requiring both efficiency and accuracy.
arXiv Detail & Related papers (2021-11-11T15:36:55Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
Recognition [68.70214388982545]
Temporal modelling is the key for efficient video action recognition.
We introduce an adaptive temporal fusion network, called AdaFuse, that fuses channels from current and past feature maps.
Our approach can achieve about 40% computation savings with comparable accuracy to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-10T23:31:02Z) - A novel shape matching descriptor for real-time hand gesture recognition [11.798555201744596]
We present a novel shape matching methodology for real-time hand gesture recognition.
Our method outperforms the other methods and provides a good combination of accuracy and computational efficiency for real-time applications.
arXiv Detail & Related papers (2021-01-11T14:41:57Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.