Asia Cup 2025: A Structured T20 Match-Level Dataset and Exploratory Analysis for Cricket Analytics
- URL: http://arxiv.org/abs/2512.19740v1
- Date: Wed, 17 Dec 2025 20:02:50 GMT
- Title: Asia Cup 2025: A Structured T20 Match-Level Dataset and Exploratory Analysis for Cricket Analytics
- Authors: Kousar Raza, Faizan Ali,
- Abstract summary: This paper presents a structured and comprehensive dataset corresponding to the 2025 Asia Cup T20 cricket tournament.<n>The dataset comprises records from all 19 matches of the tournament and includes 61 variables covering team scores, wickets, powerplay statistics, boundary counts, toss decisions, venues, and player-specific highlights.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a structured and comprehensive dataset corresponding to the 2025 Asia Cup T20 cricket tournament, designed to facilitate data-driven research in sports analytics. The dataset comprises records from all 19 matches of the tournament and includes 61 variables covering team scores, wickets, powerplay statistics, boundary counts, toss decisions, venues, and player-specific highlights. To demonstrate its analytical value, we conduct an exploratory data analysis focusing on team performance indicators, boundary distributions, and scoring patterns. The dataset is publicly released through Zenodo under a CC-BY 4.0 license to support reproducibility and further research in cricket analytics, predictive modeling, and strategic decision-making. This work contributes an open, machine-readable benchmark dataset for advancing cricket analytics research.
Related papers
- OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value [74.80873109856563]
OpenDataArena (ODA) is a holistic and open platform designed to benchmark the intrinsic value of post-training data.<n>ODA establishes a comprehensive ecosystem comprising four key pillars: (i) a unified training-evaluation pipeline that ensures fair, open comparisons across diverse models; (ii) a multi-dimensional scoring framework that profiles data quality along tens of distinct axes; and (iii) an interactive data lineage explorer to visualize dataset genealogy and dissect component sources.
arXiv Detail & Related papers (2025-12-16T03:33:24Z) - SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation [90.53485251837235]
Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training.
GIFT-Eval is a pioneering benchmark aimed at promoting evaluation across diverse datasets.
GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points.
arXiv Detail & Related papers (2024-10-14T11:29:38Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - TTSWING: a Dataset for Table Tennis Swing Analysis [1.539942973115038]
This dataset comprises comprehensive swing information obtained through 9-axis sensors integrated into custom-made racket grips.
We detail the data collection and annotation procedures.
We conduct pilot studies utilizing diverse machine learning models for swing analysis.
arXiv Detail & Related papers (2023-06-30T11:06:46Z) - ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton
Tactical Analysis [5.609957071296952]
We present ShuttleSet, the largest publicly-available badminton singles dataset with annotated stroke-level records.
It contains 104 sets, 3,685 rallies, and 36,492 strokes in 44 matches between 2018 and 2021 with 27 top-ranking men's singles and women's singles players.
ShuttleSet is manually annotated with a computer-aided labeling tool to increase the labeling efficiency and effectiveness of selecting the shot type.
arXiv Detail & Related papers (2023-06-08T05:41:42Z) - Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition using Wrist-Worn Inertial Sensors [47.33629411771497]
We present a benchmark dataset for evaluating physical human activity recognition methods from wrist-worn sensors.
The dataset was recorded for two teams from separate countries (USA and Germany) with a total of 24 players who wore an inertial sensor on their wrist.
arXiv Detail & Related papers (2023-05-22T15:25:29Z) - ICDAR 2023 Competition on Hierarchical Text Detection and Recognition [60.68100769639923]
The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition.
We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule.
During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks.
arXiv Detail & Related papers (2023-05-16T18:56:12Z) - The ProfessionAl Go annotation datasEt (PAGE) [3.1723119892509573]
We present the ProfessionsEt dataset, containing 98,525 games played by 2,007 professional players and spans over 70 years.
The dataset includes rich AI analysis results for each move. Moreover, PAGE provides detailed metadata for every player and game after manual cleaning and labeling.
arXiv Detail & Related papers (2022-11-03T02:41:41Z) - PGD: A Large-scale Professional Go Dataset for Data-driven Analytics [3.747666374070152]
This paper creates the Professional Go dataset, containing 98,043 games played by 2,148 professional players from 1950 to 2021.
The dataset includes analysis results for each move in the match evaluated by advanced AlphaZero-based AI.
With the help of complete meta-information and constructed in-game features, our results prediction system achieves an accuracy of 75.30%.
arXiv Detail & Related papers (2022-04-30T12:53:04Z) - SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
Soccer Videos [62.686484228479095]
We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
arXiv Detail & Related papers (2022-04-14T12:22:12Z) - Efficient Feature Representations for Cricket Data Analysis using Deep
Learning based Multi-Modal Fusion Model [0.0]
This study investigates the use of adaptive (learnable) embeddings to represent inter-related features.
The data used for this study is collected from a classical T20 tournament IPL (Indian Premier League)
arXiv Detail & Related papers (2021-08-16T15:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.