BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization
- URL: http://arxiv.org/abs/2511.16524v1
- Date: Thu, 20 Nov 2025 16:37:07 GMT
- Title: BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization
- Authors: Rahul Kumar, Vipul Baghel, Sudhanshu Singh, Bikash Kumar Badatya, Shivam Yadav, Babji Srinivasan, Ravi Hegde,
- Abstract summary: We present a comprehensive, well-annotated video dataset tailored for punch detection and classification in boxing.<n>The dataset comprises 6,915 high-quality punch clips categorized into six distinct punch types.<n>This contribution aims to accelerate progress in movement analysis, automated coaching, and performance assessment within boxing and related domains.
- Score: 1.623267727687624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate analysis of combat sports using computer vision has gained traction in recent years, yet the development of robust datasets remains a major bottleneck due to the dynamic, unstructured nature of actions and variations in recording environments. In this work, we present a comprehensive, well-annotated video dataset tailored for punch detection and classification in boxing. The dataset comprises 6,915 high-quality punch clips categorized into six distinct punch types, extracted from 20 publicly available YouTube sparring sessions and involving 18 different athletes. Each clip is manually segmented and labeled to ensure precise temporal boundaries and class consistency, capturing a wide range of motion styles, camera angles, and athlete physiques. This dataset is specifically curated to support research in real-time vision-based action recognition, especially in low-resource and unconstrained environments. By providing a rich benchmark with diverse punch examples, this contribution aims to accelerate progress in movement analysis, automated coaching, and performance assessment within boxing and related domains.
Related papers
- BoxMAC -- A Boxing Dataset for Multi-label Action Classification [0.0]
BoxMAC is a real-world boxing dataset featuring 15 professional boxers and 13 distinct action labels.<n>We propose a novel architecture for jointly recognizing multiple actions in both individual images and videos.<n>BoxMAC can serve as a valuable resource for the advancement of boxing as a sport.
arXiv Detail & Related papers (2024-12-24T06:20:01Z) - FACTS: Fine-Grained Action Classification for Tactical Sports [4.810476621219244]
Classifying fine-grained actions in fast-paced, close-combat sports such as fencing and boxing presents unique challenges.<n>We introduce FACTS, a novel approach for fine-grained action recognition that processes raw video data directly.<n>Our findings enhance training, performance analysis, and spectator engagement, setting a new benchmark for action classification in tactical sports.
arXiv Detail & Related papers (2024-12-21T03:00:25Z) - ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition [111.32822459456793]
ActionAtlas is a video question answering benchmark featuring short videos across various sports.
The dataset includes 934 videos showcasing 580 unique actions across 56 sports, with a total of 1896 actions within choices.
We evaluate open and proprietary foundation models on this benchmark, finding that the best model, GPT-4o, achieves a maximum accuracy of 45.52%.
arXiv Detail & Related papers (2024-10-08T07:55:09Z) - Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances.
This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z) - Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset [16.407837909069073]
We introduce the VideoBadminton dataset derived from high-quality badminton footage.
The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions.
arXiv Detail & Related papers (2024-03-19T02:52:06Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - A Survey on Video Action Recognition in Sports: Datasets, Methods and
Applications [60.3327085463545]
We present a survey on video action recognition for sports analytics.
We introduce more than ten types of sports, including team sports, such as football, basketball, volleyball, hockey and individual sports, such as figure skating, gymnastics, table tennis, diving and badminton.
We develop a toolbox using PaddlePaddle, which supports football, basketball, table tennis and figure skating action recognition.
arXiv Detail & Related papers (2022-06-02T13:19:36Z) - Sports Video: Fine-Grained Action Detection and Classification of Table
Tennis Strokes from Videos for MediaEval 2021 [0.0]
This task tackles fine-grained action detection and classification from videos.
The focus is on recordings of table tennis games.
This work aims at creating tools for sports coaches and players in order to analyze sports performance.
arXiv Detail & Related papers (2021-12-16T10:17:59Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - FineGym: A Hierarchical Video Dataset for Fine-grained Action
Understanding [118.32912239230272]
FineGym is a new action recognition dataset built on top of gymnastic videos.
It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy.
This new level of granularity presents significant challenges for action recognition.
arXiv Detail & Related papers (2020-04-14T17:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.