Sports Video Analysis on Large-Scale Data
- URL: http://arxiv.org/abs/2208.04897v1
- Date: Tue, 9 Aug 2022 16:59:24 GMT
- Title: Sports Video Analysis on Large-Scale Data
- Authors: Dekun Wu and He Zhao and Xingce Bao and Richard P. Wildes
- Abstract summary: This paper investigates the modeling of automated machine description on sports video.
We propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning.
- Score: 10.24207108909385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the modeling of automated machine description on
sports video, which has seen much progress recently. Nevertheless,
state-of-the-art approaches fall quite short of capturing how human experts
analyze sports scenes. There are several major reasons: (1) The used dataset is
collected from non-official providers, which naturally creates a gap between
models trained on those datasets and real-world applications; (2) previously
proposed methods require extensive annotation efforts (i.e., player and ball
segmentation at pixel level) on localizing useful visual features to yield
acceptable results; (3) very few public datasets are available. In this paper,
we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA)
with a focus on captioning, to address the above challenges. We also design a
unified approach to process raw videos into a stack of meaningful features with
minimum labelling efforts, showing that cross modeling on such features using a
transformer architecture leads to strong performance. In addition, we
demonstrate the broad application of NSVA by addressing two additional tasks,
namely fine-grained sports action recognition and salient player
identification. Code and dataset are available at
https://github.com/jackwu502/NSVA.
Related papers
- Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset [16.407837909069073]
We introduce the VideoBadminton dataset derived from high-quality badminton footage.
The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions.
arXiv Detail & Related papers (2024-03-19T02:52:06Z) - Helping Hands: An Object-Aware Ego-Centric Video Recognition Model [60.350851196619296]
We introduce an object-aware decoder for improving the performance of ego-centric representations on ego-centric videos.
We show that the model can act as a drop-in replacement for an ego-awareness video model to improve performance through visual-text grounding.
arXiv Detail & Related papers (2023-08-15T17:58:11Z) - Dense Video Object Captioning from Disjoint Supervision [77.47084982558101]
We propose a new task and model for dense video object captioning.
This task unifies spatial and temporal localization in video.
We show how our model improves upon a number of strong baselines for this new task.
arXiv Detail & Related papers (2023-06-20T17:57:23Z) - Weakly Supervised Two-Stage Training Scheme for Deep Video Fight
Detection Model [0.0]
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media.
Previous work has largely relied on action recognition techniques to tackle this problem.
We design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.
arXiv Detail & Related papers (2022-09-23T08:29:16Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - EventAnchor: Reducing Human Interactions in Event Annotation of Racket
Sports Videos [26.516909452362455]
EventAnchor is a data analysis framework to facilitate interactive annotation of racket sports video.
Our approach uses machine learning models in computer vision to help users acquire essential events from videos.
arXiv Detail & Related papers (2021-01-13T09:32:05Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z) - Fine-Grained Instance-Level Sketch-Based Video Retrieval [159.12935292432743]
We propose a novel cross-modal retrieval problem of fine-grained instance-level sketch-based video retrieval (FG-SBVR)
Compared with sketch-based still image retrieval, and coarse-grained category-level video retrieval, this is more challenging as both visual appearance and motion need to be simultaneously matched at a fine-grained level.
We show that this model significantly outperforms a number of existing state-of-the-art models designed for video analysis.
arXiv Detail & Related papers (2020-02-21T18:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.