Evaluating Soccer Match Prediction Models: A Deep Learning Approach and
Feature Optimization for Gradient-Boosted Trees
- URL: http://arxiv.org/abs/2309.14807v1
- Date: Tue, 26 Sep 2023 10:05:46 GMT
- Title: Evaluating Soccer Match Prediction Models: A Deep Learning Approach and
Feature Optimization for Gradient-Boosted Trees
- Authors: Calvin Yeung, Rory Bunker, Rikuhei Umemoto, Keisuke Fujii
- Abstract summary: The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss.
A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities.
In this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model.
- Score: 0.8009842832476994
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning models have become increasingly popular for predicting the
results of soccer matches, however, the lack of publicly-available benchmark
datasets has made model evaluation challenging. The 2023 Soccer Prediction
Challenge required the prediction of match results first in terms of the exact
goals scored by each team, and second, in terms of the probabilities for a win,
draw, and loss. The original training set of matches and features, which was
provided for the competition, was augmented with additional matches that were
played between 4 April and 13 April 2023, representing the period after which
the training set ended, but prior to the first matches that were to be
predicted (upon which the performance was evaluated). A CatBoost model was
employed using pi-ratings as the features, which were initially identified as
the optimal choice for calculating the win/draw/loss probabilities. Notably,
deep learning models have frequently been disregarded in this particular task.
Therefore, in this study, we aimed to assess the performance of a deep learning
model and determine the optimal feature set for a gradient-boosted tree model.
The model was trained using the most recent five years of data, and three
training and validation sets were used in a hyperparameter grid search. The
results from the validation sets show that our model had strong performance and
stability compared to previously published models from the 2017 Soccer
Prediction Challenge for win/draw/loss prediction.
Related papers
- Valeo4Cast: A Modular Approach to End-to-End Forecasting [93.86257326005726]
Our solution ranks first in the Argoverse 2 End-to-end Forecasting Challenge, with 63.82 mAPf.
We depart from the current trend of tackling this task via end-to-end training from perception to forecasting, and instead use a modular approach.
We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up.
arXiv Detail & Related papers (2024-06-12T11:50:51Z) - Machine Learning for Soccer Match Result Prediction [0.9002260638342727]
This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance.
The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction.
arXiv Detail & Related papers (2024-03-12T14:00:50Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Supervised Learning for Table Tennis Match Prediction [2.7835697868135902]
This paper proposes the use of machine learning to predict the outcome of table tennis single matches.
We use player and match statistics as features and evaluate their relative importance in an ablation study.
The results can serve as a baseline for future table tennis prediction models, and can feed back to prediction research in similar ball sports.
arXiv Detail & Related papers (2023-03-28T17:42:13Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Explainable expected goal models for performance analysis in football
analytics [5.802346990263708]
This paper proposes an accurate expected goal model trained consisting of 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues.
To best of our knowledge, this is the first paper that demonstrates a practical application of an explainable artificial intelligence tool aggregated profiles.
arXiv Detail & Related papers (2022-06-14T23:56:03Z) - Model Selection, Adaptation, and Combination for Deep Transfer Learning
through Neural Networks in Renewable Energies [5.953831950062808]
We conduct the first thorough experiment for model selection and adaptation for transfer learning in renewable power forecast.
We adopt models based on data from different seasons and limit the amount of training data.
We show how combining multiple models through ensembles can significantly improve the model selection and adaptation approach.
arXiv Detail & Related papers (2022-04-28T05:34:50Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Enhancing Trajectory Prediction using Sparse Outputs: Application to
Team Sports [6.26476800426345]
It can be surprisingly challenging to train a deep learning model for player prediction.
We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration.
We find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players.
arXiv Detail & Related papers (2021-06-01T01:43:19Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.