Related papers: Machine Learning for Soccer Match Result Prediction

Machine Learning for Soccer Match Result Prediction

URL: http://arxiv.org/abs/2403.07669v1
Date: Tue, 12 Mar 2024 14:00:50 GMT
Title: Machine Learning for Soccer Match Result Prediction
Authors: Rory Bunker, Calvin Yeung, Keisuke Fujii
Abstract summary: This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance. The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction.
Score: 0.9002260638342727
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning has become a common approach to predicting the outcomes of soccer matches, and the body of literature in this domain has grown substantially in the past decade and a half. This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance in this application domain. The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction, as a resource for those interested in conducting future studies in the area. Our main findings are that while gradient-boosted tree models such as CatBoost, applied to soccer-specific ratings such as pi-ratings, are currently the best-performing models on datasets containing only goals as the match features, there needs to be a more thorough comparison of the performance of deep learning models and Random Forest on a range of datasets with different types of features. Furthermore, new rating systems using both player- and team-level information and incorporating additional information from, e.g., spatiotemporal tracking and event data, could be investigated further. Finally, the interpretability of match result prediction models needs to be enhanced for them to be more useful for team management.

Related papers

LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees [0.8009842832476994]
The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. In this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model.
arXiv Detail & Related papers (2023-09-26T10:05:46Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Multi-granulariy Time-based Transformer for Knowledge Tracing [9.788039182463768]
We leverage students historical data, including their past test scores, to create a personalized model for each student. We then use these models to predict their future performance on a given test.
arXiv Detail & Related papers (2023-04-11T14:46:38Z)
Supervised Learning for Table Tennis Match Prediction [2.7835697868135902]
This paper proposes the use of machine learning to predict the outcome of table tennis single matches. We use player and match statistics as features and evaluate their relative importance in an ablation study. The results can serve as a baseline for future table tennis prediction models, and can feed back to prediction research in similar ball sports.
arXiv Detail & Related papers (2023-03-28T17:42:13Z)
Explainable expected goal models for performance analysis in football analytics [5.802346990263708]
This paper proposes an accurate expected goal model trained consisting of 315,430 shots from seven seasons between 2014-15 and 2020-21 of the top-five European football leagues. To best of our knowledge, this is the first paper that demonstrates a practical application of an explainable artificial intelligence tool aggregated profiles.
arXiv Detail & Related papers (2022-06-14T23:56:03Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Machine-Generated Hierarchical Structure of Human Activities to Reveal How Machines Think [0.0]
We argue the importance and feasibility of constructing a hierarchical labeling system for human activity recognition. We utilize the predictions of a black box HAR model to identify similarities between different activities. In this system, the activity labels on the same level will have a designed magnitude of accuracy and reflect a specific amount of activity details.
arXiv Detail & Related papers (2021-01-19T20:40:22Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.