Related papers: Markov Cricket: Using Forward and Inverse Reinforcement Learning to Model, Predict And Optimize Batting Performance in One-Day International Cricket

Markov Cricket: Using Forward and Inverse Reinforcement Learning to Model, Predict And Optimize Batting Performance in One-Day International Cricket

URL: http://arxiv.org/abs/2103.04349v1
Date: Sun, 7 Mar 2021 13:11:16 GMT
Title: Markov Cricket: Using Forward and Inverse Reinforcement Learning to Model, Predict And Optimize Batting Performance in One-Day International Cricket
Authors: Manohar Vohra and George S. D. Gordon
Abstract summary: We model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game. We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method by 3 to 10 fold. We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy.
Score: 0.8122270502556374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game. First, we apply Monte-Carlo learning to fit a nonlinear approximation of the value function for each state of the game using a score-based reward model. We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method used in professional matches by 3 to 10 fold. Next, we use inverse reinforcement learning, specifically a variant of guided-cost learning, to infer a linear model of rewards based on expert performances, assumed here to be play sequences of winning teams. From this model we explicitly determine the optimal policy for each state and find this agrees with common intuitions about the game. Finally, we use the inferred reward models to construct a game simulator that models the posterior distribution of final scores under different policies. We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy. Further, we anticipate our method of applying RL to this game may have broader application to other sports with discrete states of play where teams take turns, such as baseball and rounders.

Related papers

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent [57.622821649679786]
Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences. In this paper, we drop the Bradley-Terry (BT) model assumption and study LLM alignment under general preferences, formulated as a two-player game. We show that our approach achieves an $O(T-1)$ bound on the duality gap, improving upon the previous $O(T-1/2)$ result.
arXiv Detail & Related papers (2025-02-24T05:24:52Z)
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO) Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses. With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z)
Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output. We use these attention weights to redistribute the reward along the whole completion. Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z)
ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing Forecasting Models in Badminton [52.21869064818728]
Deep learning approaches for player tactic forecasting in badminton show promising performance partially attributed to effective reasoning about rally-player interactions. We propose a turn-based feature attribution approach, ShuttleSHAP, for analyzing forecasting models in badminton based on variants of Shapley values.
arXiv Detail & Related papers (2023-12-18T05:37:51Z)
Optimizing Offensive Gameplan in the National Basketball Association with Machine Learning [0.0]
ORTG (Offensive Rating) was developed by Dean Oliver. In this paper, the statistic ORTG was found to have a correlation with different NBA playtypes. Using the accuracy of the models as a justification, the next step was to optimize the output of the model.
arXiv Detail & Related papers (2023-08-13T22:03:35Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
A Ranking Game for Imitation Learning [22.028680861819215]
We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
arXiv Detail & Related papers (2022-02-07T19:38:22Z)
Enhancing Trajectory Prediction using Sparse Outputs: Application to Team Sports [6.26476800426345]
It can be surprisingly challenging to train a deep learning model for player prediction. We propose and test a novel method for improving training by predicting a sparse trajectory and interpolating using constant acceleration. We find that the accuracy of predicted trajectories for a subset of players can be improved by conditioning on the full trajectories of the other players.
arXiv Detail & Related papers (2021-06-01T01:43:19Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Optimising Game Tactics for Football [18.135001427294032]
We present a novel approach to optimise tactical and strategic decision making in football (soccer) We model the game of football as a multi-stage game which is made up from a Bayesian game to model the pre-match decisions and the game to model the in-match state transitions and decisions. Building upon this, we develop algorithms to optimise team formation and in-game tactics with different objectives.
arXiv Detail & Related papers (2020-03-23T14:24:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.