An X3D Neural Network Analysis for Runner's Performance Assessment in a
Wild Sporting Environment
- URL: http://arxiv.org/abs/2307.12183v1
- Date: Sat, 22 Jul 2023 23:15:47 GMT
- Title: An X3D Neural Network Analysis for Runner's Performance Assessment in a
Wild Sporting Environment
- Authors: David Freire-Obreg\'on, Javier Lorenzo-Navarro, Oliverio J. Santana,
Daniel Hern\'andez-Sosa, Modesto Castrill\'on-Santana
- Abstract summary: We present a transfer learning analysis on a sporting environment of the expanded 3D (X3D) neural networks.
Inspired by action quality assessment methods in the literature, our method uses an action recognition network to estimate athletes' cumulative race time.
X3D achieves state-of-the-art performance while requiring almost seven times less memory to achieve better precision than previous work.
- Score: 1.4859458229776121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a transfer learning analysis on a sporting environment of the
expanded 3D (X3D) neural networks. Inspired by action quality assessment
methods in the literature, our method uses an action recognition network to
estimate athletes' cumulative race time (CRT) during an ultra-distance
competition. We evaluate the performance considering the X3D, a family of
action recognition networks that expand a small 2D image classification
architecture along multiple network axes, including space, time, width, and
depth. We demonstrate that the resulting neural network can provide remarkable
performance for short input footage, with a mean absolute error of 12 minutes
and a half when estimating the CRT for runners who have been active from 8 to
20 hours. Our most significant discovery is that X3D achieves state-of-the-art
performance while requiring almost seven times less memory to achieve better
precision than previous work.
Related papers
- DELTA: Dense Efficient Long-range 3D Tracking for any video [82.26753323263009]
We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos.
Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions.
Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.
arXiv Detail & Related papers (2024-10-31T17:59:01Z) - FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D
Bird's-Eye View and Perspective View [46.81548000021799]
In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes.
Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design.
A new method, dubbed FastOcc, is proposed to accelerate the model while keeping its accuracy.
Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves a fast inference speed.
arXiv Detail & Related papers (2024-03-05T07:01:53Z) - EfficientNeRF: Efficient Neural Radiance Fields [63.76830521051605]
We present EfficientNeRF as an efficient NeRF-based method to represent 3D scene and synthesize novel-view images.
Our method can reduce over 88% of training time, reach rendering speed of over 200 FPS, while still achieving competitive accuracy.
arXiv Detail & Related papers (2022-06-02T05:36:44Z) - Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device [53.323878851563414]
We propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques.
Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically.
The proposed framework achieves real-time 3D object detection on mobile devices with competitive detection performance.
arXiv Detail & Related papers (2020-12-26T19:41:15Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - 3D Human Pose Estimation using Spatio-Temporal Networks with Explicit
Occlusion Training [40.933783830017035]
Estimating 3D poses from a monocular task is still a challenging task, despite the significant progress that has been made in recent years.
We introduce a-temporal video network for robust 3D human pose estimation.
We apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multistride temporal convolutional net-works (TCNs) to estimate 3D joints or keypoints.
arXiv Detail & Related papers (2020-04-07T09:12:12Z) - Lightweight 3D Human Pose Estimation Network Training Using
Teacher-Student Learning [15.321557614896268]
MoVNect is a lightweight deep neural network to capture 3D human pose using a single RGB camera.
We apply the teacher-student learning method based knowledge distillation to 3D human pose estimation.
We implement a 3D avatar application running on mobile in real-time to demonstrate that our network achieves both high accuracy and fast inference time.
arXiv Detail & Related papers (2020-01-15T01:31:01Z) - Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture
Recognition [23.054444026402738]
We present a multimodal gesture recognition method based on 3D densely convolutional networks (3D-DenseNets) and improved temporal convolutional networks (TCNs)
In spatial analysis, we adopt 3D-DenseNets to learn short-term-temporal features effectively.
In temporal analysis, we use TCNs to extract temporal features and employ improved Squeeze-and-Excitation Networks (SENets) to strengthen the representational power of temporal features from each TCNs' layers.
arXiv Detail & Related papers (2019-12-31T23:30:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.