3D attention mechanism for fine-grained classification of table tennis
strokes using a Twin Spatio-Temporal Convolutional Neural Networks
- URL: http://arxiv.org/abs/2012.05342v1
- Date: Fri, 20 Nov 2020 09:55:12 GMT
- Title: 3D attention mechanism for fine-grained classification of table tennis
strokes using a Twin Spatio-Temporal Convolutional Neural Networks
- Authors: Pierre-Etienne Martin (LaBRI, UB), Jenny Benois-Pineau (LaBRI), Renaud
P\'eteri, Julien Morlier
- Abstract summary: The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes.
Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow.
We introduce 3D attention modules and examine their impact on classification efficiency.
- Score: 1.181206257787103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper addresses the problem of recognition of actions in video with low
inter-class variability such as Table Tennis strokes. Two stream, "twin"
convolutional neural networks are used with 3D convolutions both on RGB data
and optical flow. Actions are recognized by classification of temporal windows.
We introduce 3D attention modules and examine their impact on classification
efficiency. In the context of the study of sportsmen performances, a corpus of
the particular actions of table tennis strokes is considered. The use of
attention blocks in the network speeds up the training step and improves the
classification scores up to 5% with our twin model. We visualize the impact on
the obtained features and notice correlation between attention and player
movements and position. Score comparison of state-of-the-art action
classification method and proposed approach with attentional blocks is
performed on the corpus. Proposed model with attention blocks outperforms
previous model without them and our baseline.
Related papers
- DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects [48.65846477275723]
This study proposes novel dual-current neural networks (DCNN) to improve the accuracy of fine-grained image classification.
The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) keeping the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features.
arXiv Detail & Related papers (2024-05-07T07:51:28Z) - 3D Convolutional Networks for Action Recognition: Application to Sport
Gesture Recognition [0.0]
We are interested in the classification of continuous video takes with repeatable actions, such as strokes of table tennis.
The 3D convnets are an efficient tool for solving these problems with window-based approaches.
arXiv Detail & Related papers (2022-04-13T13:21:07Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Three-Stream 3D/1D CNN for Fine-Grained Action Classification and
Segmentation in Table Tennis [0.0]
It is applied to TT-21 dataset which consists of untrimmed videos of table tennis games.
The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme.
The pose is also investigated in order to offer richer feedback to the athletes.
arXiv Detail & Related papers (2021-09-29T09:43:21Z) - Spot What Matters: Learning Context Using Graph Convolutional Networks
for Weakly-Supervised Action Detection [0.0]
We introduce an architecture based on self-attention and Convolutional Networks to improve human action detection in video.
Our model aids explainability by visualizing the learned context as an attention map, even for actions and objects unseen during training.
Experimental results show that our contextualized approach outperforms a baseline action detection approach by more than 2 points in Video-mAP.
arXiv Detail & Related papers (2021-07-28T21:37:18Z) - Multi-level Motion Attention for Human Motion Prediction [132.29963836262394]
We study the use of different types of attention, computed at joint, body part, and full pose levels.
Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2021-06-17T08:08:11Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.