Unsupervised Temporal Feature Aggregation for Event Detection in
Unstructured Sports Videos
- URL: http://arxiv.org/abs/2002.08097v1
- Date: Wed, 19 Feb 2020 10:24:22 GMT
- Title: Unsupervised Temporal Feature Aggregation for Event Detection in
Unstructured Sports Videos
- Authors: Subhajit Chaudhury, Daiki Kimura, Phongtharin Vinayavekhin, Asim
Munawar, Ryuki Tachibana, Koji Ito, Yuki Inaba, Minoru Matsumoto, Shuji
Kidokoro and Hiroki Ozaki
- Abstract summary: We study the case of event detection in sports videos for unstructured environments with arbitrary camera angles.
We identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles.
- Score: 10.230408415438966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based sports analytics enable automatic retrieval of key events in a
game to speed up the analytics process for human experts. However, most
existing methods focus on structured television broadcast video datasets with a
straight and fixed camera having minimum variability in the capturing pose. In
this paper, we study the case of event detection in sports videos for
unstructured environments with arbitrary camera angles. The transition from
structured to unstructured video analysis produces multiple challenges that we
address in our paper. Specifically, we identify and solve two major problems:
unsupervised identification of players in an unstructured setting and
generalization of the trained models to pose variations due to arbitrary
shooting angles. For the first problem, we propose a temporal feature
aggregation algorithm using person re-identification features to obtain high
player retrieval precision by boosting a weak heuristic scoring method.
Additionally, we propose a data augmentation technique, based on multi-modal
image translation model, to reduce bias in the appearance of training samples.
Experimental evaluations show that our proposed method improves precision for
player retrieval from 0.78 to 0.86 for obliquely angled videos. Additionally,
we obtain an improvement in F1 score for rally detection in table tennis videos
from 0.79 in case of global frame-level features to 0.89 using our proposed
player-level features. Please see the supplementary video submission at
https://ibm.biz/BdzeZA.
Related papers
- SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame
Interpolation [11.198172694893927]
SportsSloMo is a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution ($geq$720p) slow-motion sports videos crawled from YouTube.
We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets.
We introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection.
arXiv Detail & Related papers (2023-08-31T17:23:50Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - Sports Video Analysis on Large-Scale Data [10.24207108909385]
This paper investigates the modeling of automated machine description on sports video.
We propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning.
arXiv Detail & Related papers (2022-08-09T16:59:24Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - RSPNet: Relative Speed Perception for Unsupervised Video Representation
Learning [100.76672109782815]
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only.
It is difficult to construct a suitable self-supervised task to well model both motion and appearance features.
We propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels.
arXiv Detail & Related papers (2020-10-27T16:42:50Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z) - A Hybrid Approach for Tracking Individual Players in Broadcast Match
Videos [1.160208922584163]
This paper introduces a player tracking solution which is both fast and accurate.
The approach combines several models that are executed concurrently in a relatively modest hardware.
As for performance, our proposal can process high definition videos (1920x1080) at 80 fps.
arXiv Detail & Related papers (2020-03-06T15:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.