3D Convolutional Networks for Action Recognition: Application to Sport
Gesture Recognition
- URL: http://arxiv.org/abs/2204.08460v1
- Date: Wed, 13 Apr 2022 13:21:07 GMT
- Title: 3D Convolutional Networks for Action Recognition: Application to Sport
Gesture Recognition
- Authors: Pierre-Etienne Martin (LaBRI, MPI-EVA, UB), J Benois-Pineau, R
P\'eteri, A Zemmari, J Morlier
- Abstract summary: We are interested in the classification of continuous video takes with repeatable actions, such as strokes of table tennis.
The 3D convnets are an efficient tool for solving these problems with window-based approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D convolutional networks is a good means to perform tasks such as video
segmentation into coherent spatio-temporal chunks and classification of them
with regard to a target taxonomy. In the chapter we are interested in the
classification of continuous video takes with repeatable actions, such as
strokes of table tennis. Filmed in a free marker less ecological environment,
these videos represent a challenge from both segmentation and classification
point of view. The 3D convnets are an efficient tool for solving these problems
with window-based approaches.
Related papers
- PointResNet: Residual Network for 3D Point Cloud Segmentation and
Classification [18.466814193413487]
Point cloud segmentation and classification are some of the primary tasks in 3D computer vision.
In this paper, we propose PointResNet, a residual block-based approach.
Our model directly processes the 3D points, using a deep neural network for the segmentation and classification tasks.
arXiv Detail & Related papers (2022-11-20T17:39:48Z) - Action Keypoint Network for Efficient Video Recognition [63.48422805355741]
This paper proposes to integrate temporal and spatial selection into an Action Keypoint Network (AK-Net)
AK-Net selects some informative points scattered in arbitrary-shaped regions as a set of action keypoints and then transforms the video recognition into point cloud classification.
Experimental results show that AK-Net can consistently improve the efficiency and performance of baseline methods on several video recognition benchmarks.
arXiv Detail & Related papers (2022-01-17T09:35:34Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Three-Stream 3D/1D CNN for Fine-Grained Action Classification and
Segmentation in Table Tennis [0.0]
It is applied to TT-21 dataset which consists of untrimmed videos of table tennis games.
The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme.
The pose is also investigated in order to offer richer feedback to the athletes.
arXiv Detail & Related papers (2021-09-29T09:43:21Z) - 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video
Recognition [84.697097472401]
We introduce Ada3D, a conditional computation framework that learns instance-specific 3D usage policies to determine frames and convolution layers to be used in a 3D network.
We demonstrate that our method achieves similar accuracies to state-of-the-art 3D models while requiring 20%-50% less computation across different datasets.
arXiv Detail & Related papers (2020-12-29T21:40:38Z) - Weakly-Supervised Action Localization and Action Recognition using
Global-Local Attention of 3D CNN [4.924442315857227]
3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences.
We propose two approaches to improve the visual explanations and classification in 3D CNN.
arXiv Detail & Related papers (2020-12-17T12:29:16Z) - 3D attention mechanism for fine-grained classification of table tennis
strokes using a Twin Spatio-Temporal Convolutional Neural Networks [1.181206257787103]
The paper addresses the problem of recognition of actions in video with low inter-class variability such as Table Tennis strokes.
Two stream, "twin" convolutional neural networks are used with 3D convolutions both on RGB data and optical flow.
We introduce 3D attention modules and examine their impact on classification efficiency.
arXiv Detail & Related papers (2020-11-20T09:55:12Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - Making a Case for 3D Convolutions for Object Segmentation in Videos [16.167397418720483]
We show that 3D convolutional networks can be effectively applied to dense video prediction tasks such as salient object segmentation.
We propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules.
Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal benchmarks.
arXiv Detail & Related papers (2020-08-26T12:24:23Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.