Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions
- URL: http://arxiv.org/abs/2004.06172v1
- Date: Mon, 13 Apr 2020 19:51:25 GMT
- Title: Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions
- Authors: Kanav Vats, Mehrnaz Fani, Pascale Walters, David A. Clausi, John Zelek
- Abstract summary: In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
- Score: 14.30009544149561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In problems such as sports video analytics, it is difficult to obtain
accurate frame level annotations and exact event duration because of the
lengthy videos and sheer volume of video data. This issue is even more
pronounced in fast-paced sports such as ice hockey. Obtaining annotations on a
coarse scale can be much more practical and time efficient. We propose the task
of event detection in coarsely annotated videos. We introduce a multi-tower
temporal convolutional network architecture for the proposed task. The network,
with the help of multiple receptive fields, processes information at various
temporal scales to account for the uncertainty with regard to the exact event
location and duration. We demonstrate the effectiveness of the multi-receptive
field architecture through appropriate ablation studies. The method is
evaluated on two tasks - event detection in coarsely annotated hockey videos in
the NHL dataset and event spotting in soccer on the SoccerNet dataset. The two
datasets lack frame-level annotations and have very distinct event frequencies.
Experimental results demonstrate the effectiveness of the network by obtaining
a 55% average F1 score on the NHL dataset and by achieving competitive
performance compared to the state of the art on the SoccerNet dataset. We
believe our approach will help develop more practical pipelines for event
detection in sports video.
Related papers
- EA-VTR: Event-Aware Video-Text Retrieval [97.30850809266725]
Event-Aware Video-Text Retrieval model achieves powerful video-text retrieval ability through superior video event awareness.
EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment.
arXiv Detail & Related papers (2024-07-10T09:09:58Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - Sports Video Analysis on Large-Scale Data [10.24207108909385]
This paper investigates the modeling of automated machine description on sports video.
We propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning.
arXiv Detail & Related papers (2022-08-09T16:59:24Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
Soccer Videos [62.686484228479095]
We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
arXiv Detail & Related papers (2022-04-14T12:22:12Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - RMS-Net: Regression and Masking for Soccer Event Spotting [52.742046866220484]
We devise a lightweight and modular network for action spotting, which can simultaneously predict the event label and its temporal offset.
When tested on the SoccerNet dataset and using standard features, our full proposal exceeds the current state of the art by 3 Average-mAP points.
arXiv Detail & Related papers (2021-02-15T16:04:18Z) - TTNet: Real-time temporal and spatial video analysis of table tennis [5.156484100374058]
We present a neural network aimed at real-time processing of high-resolution table tennis videos.
This approach gives core information for reasoning score updates by an auto-referee system.
We publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events.
arXiv Detail & Related papers (2020-04-21T11:57:51Z) - Unsupervised Temporal Feature Aggregation for Event Detection in
Unstructured Sports Videos [10.230408415438966]
We study the case of event detection in sports videos for unstructured environments with arbitrary camera angles.
We identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles.
arXiv Detail & Related papers (2020-02-19T10:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.