ViSTec: Video Modeling for Sports Technique Recognition and Tactical
Analysis
- URL: http://arxiv.org/abs/2402.15952v1
- Date: Sun, 25 Feb 2024 02:04:56 GMT
- Title: ViSTec: Video Modeling for Sports Technique Recognition and Tactical
Analysis
- Authors: Yuchen He, Zeqing Yuan, Yihong Wu, Liqi Cheng, Dazhen Deng, Yingcai Wu
- Abstract summary: ViSTec is a Video-based Sports Technique recognition model inspired by human cognition.
Our approach integrates a graph to explicitly model strategic knowledge in stroke sequences and enhance technique recognition with contextual inductive bias.
Case studies with experts from the Chinese national table tennis team validate our model's capacity to automate analysis.
- Score: 19.945083591851517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The immense popularity of racket sports has fueled substantial demand in
tactical analysis with broadcast videos. However, existing manual methods
require laborious annotation, and recent attempts leveraging video perception
models are limited to low-level annotations like ball trajectories, overlooking
tactics that necessitate an understanding of stroke techniques.
State-of-the-art action segmentation models also struggle with technique
recognition due to frequent occlusions and motion-induced blurring in racket
sports videos. To address these challenges, We propose ViSTec, a Video-based
Sports Technique recognition model inspired by human cognition that synergizes
sparse visual data with rich contextual insights. Our approach integrates a
graph to explicitly model strategic knowledge in stroke sequences and enhance
technique recognition with contextual inductive bias. A two-stage action
perception model is jointly trained to align with the contextual knowledge in
the graph. Experiments demonstrate that our method outperforms existing models
by a significant margin. Case studies with experts from the Chinese national
table tennis team validate our model's capacity to automate analysis for
technical actions and tactical strategies. More details are available at:
https://ViSTec2024.github.io/.
Related papers
- FACTS: Fine-Grained Action Classification for Tactical Sports [4.810476621219244]
Classifying fine-grained actions in fast-paced, close-combat sports such as fencing and boxing presents unique challenges.
We introduce FACTS, a novel approach for fine-grained action recognition that processes raw video data directly.
Our findings enhance training, performance analysis, and spectator engagement, setting a new benchmark for action classification in tactical sports.
arXiv Detail & Related papers (2024-12-21T03:00:25Z) - ExpertAF: Expert Actionable Feedback from Video [81.46431188306397]
We introduce a novel method to generate actionable feedback from video of a person doing a physical activity.
Our method takes a video demonstration and its accompanying 3D body pose and generates expert commentary.
Our method is able to reason across multi-modal input combinations to output full-spectrum, actionable coaching.
arXiv Detail & Related papers (2024-08-01T16:13:07Z) - A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
Numerous approaches have driven significant advancements in few-shot action recognition.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - Information Theoretic Text-to-Image Alignment [49.396917351264655]
We present a novel method that relies on an information-theoretic alignment measure to steer image generation.
Our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI.
arXiv Detail & Related papers (2024-05-31T12:20:02Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for
Movement Forecasting in Badminton [6.2405734957622245]
We focus on predicting what types of returning strokes will be made, and where players will move to based on previous strokes.
Existing sequence-based models neglect the effects of interactions between players, and graph-based models still suffer from multifaceted perspectives.
We propose a novel Dynamic Graphs and Hierarchical Fusion for Movement Forecasting model (DyMF) with interaction style extractors.
arXiv Detail & Related papers (2022-11-22T12:21:24Z) - Sports Video Analysis on Large-Scale Data [10.24207108909385]
This paper investigates the modeling of automated machine description on sports video.
We propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning.
arXiv Detail & Related papers (2022-08-09T16:59:24Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Learning Spatiotemporal Features via Video and Text Pair Discrimination [30.64670449131973]
Cross-modal pair (CPD) framework captures correlation between video and its associated text.
We train our CPD models on both standard video dataset (Kinetics-210k) and uncurated web video dataset (-300k) to demonstrate its effectiveness.
arXiv Detail & Related papers (2020-01-16T08:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.