Neural Sabermetrics with World Model: Play-by-play Predictive Modeling with Large Language Model
- URL: http://arxiv.org/abs/2602.07030v1
- Date: Mon, 02 Feb 2026 23:31:28 GMT
- Title: Neural Sabermetrics with World Model: Play-by-play Predictive Modeling with Large Language Model
- Authors: Young Jin Ahn, Yiyang Du, Zheyuan Zhang, Haisen Kang,
- Abstract summary: We present Neural Sabermetrics with World Model, a play-by-play world model for baseball.<n>We cast baseball games as long auto-regressive sequences of events and continuously pretrain a single Large Language Model (LLM) on more than ten years of Major League Baseball (MLB) tracking data.<n>The resulting model is capable of predicting multiple aspects of game evolution within a unified framework.
- Score: 12.123745125923731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical sabermetrics has profoundly shaped baseball analytics by summarizing long histories of play into compact statistics. While these metrics are invaluable for valuation and retrospective analysis, they do not define a generative model of how baseball games unfold pitch by pitch, leaving most existing approaches limited to single-step prediction or post-hoc analysis. In this work, we present Neural Sabermetrics with World Model, a Large Language Model (LLM) based play-by-play world model for baseball. We cast baseball games as long auto-regressive sequences of events and continuously pretrain a single LLM on more than ten years of Major League Baseball (MLB) tracking data, comprising over seven million pitch sequences and approximately three billion tokens. The resulting model is capable of predicting multiple aspects of game evolution within a unified framework. We evaluate our model on both in-distribution regular-season data and out-of-distribution postseason games and compare against strong neural baselines from prior work. Despite using a single backbone model, our approach outperforms the performance of existing baselines, (1) correctly predicting approximately 64% of next pitches within a plate appearance and (2) 78% of batter swing decisions, suggesting that LLMs can serve as effective world models for sports.
Related papers
- Long-Sequence LSTM Modeling for NBA Game Outcome Prediction Using a Novel Multi-Season Dataset [0.5039813366558307]
We introduce a newly constructed longitudinal NBA dataset covering the 2004-05 to 2024-25 seasons.<n>We present a deep learning framework designed to model long-term performance trends.<n>The LSTM achieves the best performance across all metrics, with 72.35 accuracy, 73.15 precision and 76.13 AUC-ROC.
arXiv Detail & Related papers (2025-12-09T13:32:41Z) - Pretraining Language Models to Ponder in Continuous Space [50.52734567589996]
We introduce this pondering process into language models by repeatedly invoking the forward process within a single token generation step.<n>We show that the model can learn to ponder in this way through self-supervised learning, without any human annotations.
arXiv Detail & Related papers (2025-05-27T03:47:33Z) - Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z) - Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.<n>Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency.<n>The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples.<n>The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing
Forecasting Models in Badminton [52.21869064818728]
Deep learning approaches for player tactic forecasting in badminton show promising performance partially attributed to effective reasoning about rally-player interactions.
We propose a turn-based feature attribution approach, ShuttleSHAP, for analyzing forecasting models in badminton based on variants of Shapley values.
arXiv Detail & Related papers (2023-12-18T05:37:51Z) - Lag-Llama: Towards Foundation Models for Probabilistic Time Series
Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture.
Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities.
When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Performance Prediction in Major League Baseball by Long Short-Term
Memory Networks [0.35092739016434554]
We use the sequential model Long Short-Term Memory as our main method to solve the home run prediction problem in Major League Baseball.
Our results show that Long Short-Term Memory has better performance than others and has the ability to make more exact predictions.
arXiv Detail & Related papers (2022-06-20T09:01:44Z) - Computing an Optimal Pitching Strategy in a Baseball At-Bat [19.933511825856126]
A baseball at-bat is a matchup between a pitcher and a batter.
We propose a novel model of this encounter as a zero-sum game.
In principle, this game can be solved using classical approaches.
arXiv Detail & Related papers (2021-10-08T18:09:08Z) - Machine learning models for DOTA 2 outcomes prediction [8.388178167818635]
This research paper predominantly focuses on building predictive machine and deep learning models to identify the outcome of the Dota 2 MOBA game.
Three models were investigated and compared: Linear Regression (LR), Neural Networks (NN), and a type of recurrent neural network Long Short-Term Memory (LSTM)
arXiv Detail & Related papers (2021-06-03T12:10:26Z) - Markov Cricket: Using Forward and Inverse Reinforcement Learning to
Model, Predict And Optimize Batting Performance in One-Day International
Cricket [0.8122270502556374]
We model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game.
We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method by 3 to 10 fold.
We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy.
arXiv Detail & Related papers (2021-03-07T13:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.