Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention
- URL: http://arxiv.org/abs/2509.26088v1
- Date: Tue, 30 Sep 2025 11:02:59 GMT
- Title: Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention
- Authors: Pasindu Ranasinghe, Pamudu Ranasinghe,
- Abstract summary: This study introduces a real-time, multi-modal deep learning framework to predict the direction of a penalty kick before ball contact.<n>A custom dataset of 755 penalty kick events was created from real match videos, with frame-level annotations for object detection, shooter keypoints, and final ball placement.<n>The model achieved 89% accuracy on a held-out test set, outperforming visual-only and pose-only baselines by 14-22%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Penalty kicks often decide championships, yet goalkeepers must anticipate the kicker's intent from subtle biomechanical cues within a very short time window. This study introduces a real-time, multi-modal deep learning framework to predict the direction of a penalty kick (left, middle, or right) before ball contact. The model uses a dual-branch architecture: a MobileNetV2-based CNN extracts spatial features from RGB frames, while 2D keypoints are processed by an LSTM network with attention mechanisms. Pose-derived keypoints further guide visual focus toward task-relevant regions. A distance-based thresholding method segments input sequences immediately before ball contact, ensuring consistent input across diverse footage. A custom dataset of 755 penalty kick events was created from real match videos, with frame-level annotations for object detection, shooter keypoints, and final ball placement. The model achieved 89% accuracy on a held-out test set, outperforming visual-only and pose-only baselines by 14-22%. With an inference time of 22 milliseconds, the lightweight and interpretable design makes it suitable for goalkeeper training, tactical analysis, and real-time game analytics.
Related papers
- Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data [0.0]
Traditional approaches to measuring visual exploratory behavior in soccer rely on counting visual exploratory actions (VEAs) based on rapid movements exceeding 125/s.<n>This research introduces a formulaic continuous vision layer to quantify players' visual perception from pose-enhanced tracking.<n>We demonstrate that aggregated visual metrics are predictive of controlled pitch value gained at the end of dribbling actions using 32 games of synchronized pose-enhanced tracking data and on-ball event data from the 2024 Copa America.
arXiv Detail & Related papers (2026-02-19T20:17:23Z) - CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for Basketball [45.88028371034407]
CourtMotion is atemporal modeling framework for analyzing and predicting game events and plays in professional basketball.<n>Our two-stage approach first processes skeletal tracking data through Graph Neural Networks to capture nuanced motion patterns.<n>We introduce event projection heads that explicitly connect player movements to basketball events like passes, shots, and steals, training the model to associate physical motion patterns with their purposes.
arXiv Detail & Related papers (2025-12-01T09:58:24Z) - Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence [70.2803680525165]
We introduce Open-o3 Video, a non-agent framework that integrates explicit evidence into video reasoning.<n>The model highlights key objects and bounding boxes alongside its answers, allowing reasoning to be grounded in concrete visual observations.<n>On V-STAR benchmark, Open-o3 Video achieves state-of-the-art performance, raising mAM by 14.4% and mL timestamp by 24.2%.
arXiv Detail & Related papers (2025-10-23T14:05:56Z) - Action Anticipation from SoccerNet Football Video Broadcasts [84.87912817065506]
We introduce the task of action anticipation for football broadcast videos.<n>We predict future actions in unobserved future frames within a five- or ten-second anticipation window.<n>Our work will enable applications in automated broadcasting, tactical analysis, and player decision-making.
arXiv Detail & Related papers (2025-04-16T12:24:33Z) - Domain-Guided Masked Autoencoders for Unique Player Identification [62.87054782745536]
Masked autoencoders (MAEs) have emerged as a superior alternative to conventional feature extractors.
Motivated by human vision, we devise a novel domain-guided masking policy for MAEs termed d-MAE.
We conduct experiments on three large-scale sports datasets.
arXiv Detail & Related papers (2024-03-17T20:14:57Z) - Classifying Soccer Ball-on-Goal Position Through Kicker Shooting Action [1.3887779684720984]
This research addresses whether the ball's direction after a soccer free-kick can be accurately predicted solely by observing the shooter's kicking technique.
Our approach involves utilizing neural networks to develop a model that integrates Human Action Recognition (HAR) embeddings with contextual information.
Our results reveal 69.1% accuracy when considering two primary BoGP classes: right and left.
arXiv Detail & Related papers (2023-12-23T12:11:38Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - A Graph-Based Method for Soccer Action Spotting Using Unsupervised
Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences.
In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs.
For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z) - SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
Soccer Videos [62.686484228479095]
We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
arXiv Detail & Related papers (2022-04-14T12:22:12Z) - Predicting the outcome of team movements -- Player time series analysis
using fuzzy and deep methods for representation learning [0.0]
We provide a framework for the useful encoding of short tactics and space occupations in a more extended sequence of movements or tactical plans.
We discuss the effectiveness of the proposed approach for prediction and recognition tasks on the professional basketball SportVU dataset for the 2015-16 half-season.
arXiv Detail & Related papers (2021-09-13T18:42:37Z) - Unsupervised Visual Representation Learning by Tracking Patches in Video [88.56860674483752]
We propose to use tracking as a proxy task for a computer vision system to learn the visual representations.
Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations.
arXiv Detail & Related papers (2021-05-06T09:46:42Z) - TTNet: Real-time temporal and spatial video analysis of table tennis [5.156484100374058]
We present a neural network aimed at real-time processing of high-resolution table tennis videos.
This approach gives core information for reasoning score updates by an auto-referee system.
We publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events.
arXiv Detail & Related papers (2020-04-21T11:57:51Z) - Unsupervised Temporal Feature Aggregation for Event Detection in
Unstructured Sports Videos [10.230408415438966]
We study the case of event detection in sports videos for unstructured environments with arbitrary camera angles.
We identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles.
arXiv Detail & Related papers (2020-02-19T10:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.