Machine Learning for Soccer Match Result Prediction
- URL: http://arxiv.org/abs/2403.07669v1
- Date: Tue, 12 Mar 2024 14:00:50 GMT
- Title: Machine Learning for Soccer Match Result Prediction
- Authors: Rory Bunker, Calvin Yeung, Keisuke Fujii
- Abstract summary: This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance.
The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction.
- Score: 0.9002260638342727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning has become a common approach to predicting the outcomes of
soccer matches, and the body of literature in this domain has grown
substantially in the past decade and a half. This chapter discusses available
datasets, the types of models and features, and ways of evaluating model
performance in this application domain. The aim of this chapter is to give a
broad overview of the current state and potential future developments in
machine learning for soccer match results prediction, as a resource for those
interested in conducting future studies in the area. Our main findings are that
while gradient-boosted tree models such as CatBoost, applied to soccer-specific
ratings such as pi-ratings, are currently the best-performing models on
datasets containing only goals as the match features, there needs to be a more
thorough comparison of the performance of deep learning models and Random
Forest on a range of datasets with different types of features. Furthermore,
new rating systems using both player- and team-level information and
incorporating additional information from, e.g., spatiotemporal tracking and
event data, could be investigated further. Finally, the interpretability of
match result prediction models needs to be enhanced for them to be more useful
for team management.
Related papers
- Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Evaluating Soccer Match Prediction Models: A Deep Learning Approach and
Feature Optimization for Gradient-Boosted Trees [0.8009842832476994]
The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss.
A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities.
In this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model.
arXiv Detail & Related papers (2023-09-26T10:05:46Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Multi-granulariy Time-based Transformer for Knowledge Tracing [9.788039182463768]
We leverage students historical data, including their past test scores, to create a personalized model for each student.
We then use these models to predict their future performance on a given test.
arXiv Detail & Related papers (2023-04-11T14:46:38Z) - Supervised Learning for Table Tennis Match Prediction [2.7835697868135902]
This paper proposes the use of machine learning to predict the outcome of table tennis single matches.
We use player and match statistics as features and evaluate their relative importance in an ablation study.
The results can serve as a baseline for future table tennis prediction models, and can feed back to prediction research in similar ball sports.
arXiv Detail & Related papers (2023-03-28T17:42:13Z) - Explainable expected goal models for performance analysis in football
analytics [5.802346990263708]
This paper proposes an accurate expected goal model trained consisting of 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues.
To best of our knowledge, this is the first paper that demonstrates a practical application of an explainable artificial intelligence tool aggregated profiles.
arXiv Detail & Related papers (2022-06-14T23:56:03Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Machine-Generated Hierarchical Structure of Human Activities to Reveal
How Machines Think [0.0]
We argue the importance and feasibility of constructing a hierarchical labeling system for human activity recognition.
We utilize the predictions of a black box HAR model to identify similarities between different activities.
In this system, the activity labels on the same level will have a designed magnitude of accuracy and reflect a specific amount of activity details.
arXiv Detail & Related papers (2021-01-19T20:40:22Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.