Markov Cricket: Using Forward and Inverse Reinforcement Learning to
Model, Predict And Optimize Batting Performance in One-Day International
Cricket
- URL: http://arxiv.org/abs/2103.04349v1
- Date: Sun, 7 Mar 2021 13:11:16 GMT
- Title: Markov Cricket: Using Forward and Inverse Reinforcement Learning to
Model, Predict And Optimize Batting Performance in One-Day International
Cricket
- Authors: Manohar Vohra and George S. D. Gordon
- Abstract summary: We model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game.
We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method by 3 to 10 fold.
We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy.
- Score: 0.8122270502556374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we model one-day international cricket games as Markov
processes, applying forward and inverse Reinforcement Learning (RL) to develop
three novel tools for the game. First, we apply Monte-Carlo learning to fit a
nonlinear approximation of the value function for each state of the game using
a score-based reward model. We show that, when used as a proxy for remaining
scoring resources, this approach outperforms the state-of-the-art
Duckworth-Lewis-Stern method used in professional matches by 3 to 10 fold.
Next, we use inverse reinforcement learning, specifically a variant of
guided-cost learning, to infer a linear model of rewards based on expert
performances, assumed here to be play sequences of winning teams. From this
model we explicitly determine the optimal policy for each state and find this
agrees with common intuitions about the game. Finally, we use the inferred
reward models to construct a game simulator that models the posterior
distribution of final scores under different policies. We envisage our
prediction and simulation techniques may provide a fairer alternative for
estimating final scores in interrupted games, while the inferred reward model
may provide useful insights for the professional game to optimize playing
strategy. Further, we anticipate our method of applying RL to this game may
have broader application to other sports with discrete states of play where
teams take turns, such as baseball and rounders.
Related papers
- Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO)
Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses.
With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing
Forecasting Models in Badminton [52.21869064818728]
Deep learning approaches for player tactic forecasting in badminton show promising performance partially attributed to effective reasoning about rally-player interactions.
We propose a turn-based feature attribution approach, ShuttleSHAP, for analyzing forecasting models in badminton based on variants of Shapley values.
arXiv Detail & Related papers (2023-12-18T05:37:51Z) - Optimizing Offensive Gameplan in the National Basketball Association
with Machine Learning [0.0]
ORTG (Offensive Rating) was developed by Dean Oliver.
In this paper, the statistic ORTG was found to have a correlation with different NBA playtypes.
Using the accuracy of the models as a justification, the next step was to optimize the output of the model.
arXiv Detail & Related papers (2023-08-13T22:03:35Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - A Ranking Game for Imitation Learning [22.028680861819215]
We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function.
This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences.
We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
arXiv Detail & Related papers (2022-02-07T19:38:22Z) - Enhancing Trajectory Prediction using Sparse Outputs: Application to
Team Sports [6.26476800426345]
It can be surprisingly challenging to train a deep learning model for player prediction.
We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration.
We find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players.
arXiv Detail & Related papers (2021-06-01T01:43:19Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Optimising Game Tactics for Football [18.135001427294032]
We present a novel approach to optimise tactical and strategic decision making in football (soccer)
We model the game of football as a multi-stage game which is made up from a Bayesian game to model the pre-match decisions and the game to model the in-match state transitions and decisions.
Building upon this, we develop algorithms to optimise team formation and in-game tactics with different objectives.
arXiv Detail & Related papers (2020-03-23T14:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.