Related papers: Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees

Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees

URL: http://arxiv.org/abs/2309.14807v1
Date: Tue, 26 Sep 2023 10:05:46 GMT
Title: Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees
Authors: Calvin Yeung, Rory Bunker, Rikuhei Umemoto, Keisuke Fujii
Abstract summary: The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. In this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model.
Score: 0.8009842832476994
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent five years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction.

Related papers

Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z)
Establishing Task Scaling Laws via Compute-Efficient Model Ladders [123.8193940110293]
We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. We leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict task performance.
arXiv Detail & Related papers (2024-12-05T18:21:49Z)
Valeo4Cast: A Modular Approach to End-to-End Forecasting [93.86257326005726]
Our solution ranks first in the Argoverse 2 End-to-end Forecasting Challenge, with 63.82 mAPf. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting, and instead use a modular approach. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up.
arXiv Detail & Related papers (2024-06-12T11:50:51Z)
On Training Survival Models with Scoring Rules [9.330089124239086]
This work investigates using scoring rules for model training rather than evaluation. We establish a general framework for training survival models that is model agnostic and can learn event time distributions parametrically or non-parametrically. Empirical comparisons on synthetic and real-world data indicate that scoring rules can be successfully incorporated into model training.
arXiv Detail & Related papers (2024-03-19T20:58:38Z)
Machine Learning for Soccer Match Result Prediction [0.9002260638342727]
This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance. The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction.
arXiv Detail & Related papers (2024-03-12T14:00:50Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
Learning Sample Difficulty from Pre-trained Models for Reliable Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z)
Supervised Learning for Table Tennis Match Prediction [2.7835697868135902]
This paper proposes the use of machine learning to predict the outcome of table tennis single matches. We use player and match statistics as features and evaluate their relative importance in an ablation study. The results can serve as a baseline for future table tennis prediction models, and can feed back to prediction research in similar ball sports.
arXiv Detail & Related papers (2023-03-28T17:42:13Z)
Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z)
Explainable expected goal models for performance analysis in football analytics [5.802346990263708]
This paper proposes an accurate expected goal model trained consisting of 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. To best of our knowledge, this is the first paper that demonstrates a practical application of an explainable artificial intelligence tool aggregated profiles.
arXiv Detail & Related papers (2022-06-14T23:56:03Z)
Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications. Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy. We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
Enhancing Trajectory Prediction using Sparse Outputs: Application to Team Sports [6.26476800426345]
It can be surprisingly challenging to train a deep learning model for player prediction. We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration. We find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players.
arXiv Detail & Related papers (2021-06-01T01:43:19Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.