Fusing Motion Patterns and Key Visual Information for Semantic Event
Recognition in Basketball Videos
- URL: http://arxiv.org/abs/2007.06288v1
- Date: Mon, 13 Jul 2020 10:15:44 GMT
- Title: Fusing Motion Patterns and Key Visual Information for Semantic Event
Recognition in Basketball Videos
- Authors: Lifang Wu, Zhou Yang, Qi Wang, Meng Jian, Boxuan Zhao, Junchi Yan,
Chang Wen Chen
- Abstract summary: We propose a scheme to fuse global and local motion patterns (MPs) and key visual information (KVI) for semantic event recognition in basketball videos.
An algorithm is proposed to estimate the global motions from the mixed motions based on the intrinsic property of camera adjustments.
A two-stream 3D CNN framework is utilized for group activity recognition over the separated global and local motion patterns.
- Score: 87.29451470527353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many semantic events in team sport activities e.g. basketball often involve
both group activities and the outcome (score or not). Motion patterns can be an
effective means to identify different activities. Global and local motions have
their respective emphasis on different activities, which are difficult to
capture from the optical flow due to the mixture of global and local motions.
Hence it calls for a more effective way to separate the global and local
motions. When it comes to the specific case for basketball game analysis, the
successful score for each round can be reliably detected by the appearance
variation around the basket. Based on the observations, we propose a scheme to
fuse global and local motion patterns (MPs) and key visual information (KVI)
for semantic event recognition in basketball videos. Firstly, an algorithm is
proposed to estimate the global motions from the mixed motions based on the
intrinsic property of camera adjustments. And the local motions could be
obtained from the mixed and global motions. Secondly, a two-stream 3D CNN
framework is utilized for group activity recognition over the separated global
and local motion patterns. Thirdly, the basket is detected and its appearance
features are extracted through a CNN structure. The features are utilized to
predict the success or failure. Finally, the group activity recognition and
success/failure prediction results are integrated using the kronecker product
for event recognition. Experiments on NCAA dataset demonstrate that the
proposed method obtains state-of-the-art performance.
Related papers
- Unifying Global and Local Scene Entities Modelling for Precise Action Spotting [5.474440128682843]
We introduce a novel approach that analyzes and models scene entities using an adaptive attention mechanism.
Our model has demonstrated outstanding performance, securing the 1st place in the SoccerNet-v2 Action Spotting, FineDiving, and FineGym challenge.
arXiv Detail & Related papers (2024-04-15T17:24:57Z) - Tracking Everything Everywhere All at Once [111.00807055441028]
We present a new test-time optimization method for estimating dense and long-range motion from a video sequence.
We propose a complete and globally consistent motion representation, dubbed OmniMotion.
Our approach outperforms prior state-of-the-art methods by a large margin both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-06-08T17:59:29Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Group Activity Detection from Trajectory and Video Data in Soccer [16.134402513773463]
Group activity detection in soccer can be done by using either video data or player and ball trajectory data.
In current soccer datasets, activities are labelled as atomic events without a duration.
Our results show that most events can be detected using either vision or trajectory-based approaches with a temporal resolution of less than 0.5 seconds.
arXiv Detail & Related papers (2020-04-21T21:11:30Z) - Decoupling Video and Human Motion: Towards Practical Event Detection in
Athlete Recordings [33.770877823910176]
We propose to use 2D human pose sequences as an intermediate representation that decouples human motion from the raw video information.
We describe two approaches to event detection on pose sequences and evaluate them in complementary domains: swimming and athletics.
Our approach is not limited to these domains and shows the flexibility of pose-based motion event detection.
arXiv Detail & Related papers (2020-04-21T07:06:12Z) - FineGym: A Hierarchical Video Dataset for Fine-grained Action
Understanding [118.32912239230272]
FineGym is a new action recognition dataset built on top of gymnastic videos.
It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy.
This new level of granularity presents significant challenges for action recognition.
arXiv Detail & Related papers (2020-04-14T17:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.