DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order Books
- URL: http://arxiv.org/abs/2602.03776v1
- Date: Tue, 03 Feb 2026 17:34:56 GMT
- Title: DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order Books
- Authors: Zhuohan Wang, Carmine Ventre,
- Abstract summary: generative models for limit order books (LOBs) can reproduce realistic market dynamics, but remain fundamentally passive.<n>We propose textbfDiffLOB, a regime-conditioned textbfDiffusion model for controllable and counterfactual generation of textbfLOB trajectories.
- Score: 3.4472051115463613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern generative models for limit order books (LOBs) can reproduce realistic market dynamics, but remain fundamentally passive: they either model what typically happens without accounting for hypothetical future market conditions, or they require interaction with another agent to explore alternative outcomes. This limits their usefulness for stress testing, scenario analysis, and decision-making. We propose \textbf{DiffLOB}, a regime-conditioned \textbf{Diff}usion model for controllable and counterfactual generation of \textbf{LOB} trajectories. DiffLOB explicitly conditions the generative process on future market regimes--including trend, volatility, liquidity, and order-flow imbalance, which enables the model to answer counterfactual queries of the form: ``If the future market regime were X instead of Y, how would the limit order book evolve?'' Our systematic evaluation framework for counterfactual LOB generation consists of three criteria: (1) \textit{Controllable Realism}, measuring how well generated trajectories can reproduce marginal distributions, temporal dependence structure and regime variables; (2) \textit{Counterfactual validity}, testing whether interventions on future regimes induce consistent changes in the generated LOB dynamics; (3) \textit{Counterfactual usefulness}, assessing whether synthetic counterfactual trajectories improve downstream prediction of future market regimes.
Related papers
- D-Models and E-Models: Diversity-Stability Trade-offs in the Sampling Behavior of Large Language Models [91.21455683212224]
In large language models (LLMs), the probability of relevance for the next piece of information is linked to the probability of relevance for the next product.<n>But whether fine-grained sampling probabilities faithfully align with task requirements remains an open question.<n>We identify two model types: D-models, whose P_token exhibits large step-to-step variability and poor alignment with P_task; and E-models, whose P_token is more stable and better aligned with P_task.
arXiv Detail & Related papers (2026-01-25T14:59:09Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - How to model Human Actions distribution with Event Sequence Data [22.25731364559209]
We study the forecasting of the future distribution of events in human action sequences.<n>We find that a simple explicit distribution forecasting objective consistently surpasses complex implicit baselines.<n>This work provides a principled framework for selecting modeling strategies and offers practical guidance for building more accurate and robust forecasting systems.
arXiv Detail & Related papers (2025-10-07T12:24:54Z) - DiffVolume: Diffusion Models for Volume Generation in Limit Order Books [1.5193212081459284]
We propose a conditional textbfDiffusion model for the generation of future LOB textbfVolume snapshots (textbfDiffVolume)<n>We show that DiffVolume, conditioned on past volume history and time of day, better reproduces statistical properties such as marginal distribution, spatial correlation, and autocorrelation decay.
arXiv Detail & Related papers (2025-08-12T07:42:00Z) - Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting [52.6508222408558]
We introduce Elucidated Rolling Diffusion Models (ERDM)<n>ERDM is the first framework to unify a rolling forecast structure with the principled, performant design of Elucidated Diffusion Models (EDM)<n>On 2D Navier-Stokes simulations and ERA5 global weather forecasting at 1.5circ resolution, ERDM consistently outperforms key diffusion-based baselines.
arXiv Detail & Related papers (2025-06-24T21:44:31Z) - Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation [55.2788567621326]
We introduce a novel benchmark, FIN-FORCE-FINancial FORward Counterfactual Evaluation.<n>By curating financial news headlines, FIN-FORCE supports LLM based forward counterfactual generation.<n>This paves the way for scalable and automated solutions for exploring and anticipating future market developments.
arXiv Detail & Related papers (2025-05-26T02:41:50Z) - LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data [7.317765812144531]
We present a benchmark designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB)<n>Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data.<n>The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times.
arXiv Detail & Related papers (2025-02-13T10:56:58Z) - Generative AI for End-to-End Limit Order Book Modelling: A Token-Level
Autoregressive Generative Model of Message Flow Using a Deep State Space
Network [7.54290390842336]
We propose an end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages.
Using NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens.
Results show promising performance in approximating the data distribution, as evidenced by low model perplexity.
arXiv Detail & Related papers (2023-08-23T09:37:22Z) - Neural Stochastic Agent-Based Limit Order Book Simulation: A Hybrid
Methodology [6.09170287691728]
Modern financial exchanges use an electronic limit order book (LOB) to store bid and ask orders for a specific financial asset.
We propose a novel hybrid LOB simulation paradigm characterised by: (1) representing the aggregation of market events' logic by a neural background trader that is pre-trained on historical LOB data through a neural point model; and (2) embedding the background trader in a multi-agent simulation with other trading agents.
We show that the stylised facts remain and we demonstrate order flow impact and financial herding behaviours that are in accordance with empirical observations of real markets.
arXiv Detail & Related papers (2023-02-28T20:53:39Z) - Arbitrage-free neural-SDE market models [6.145654286950278]
We develop a nonparametric model for the European options book respecting underlying financial constraints.
We study the inference problem where a model is learnt from discrete time series data of stock and option prices.
We use neural networks as function approximators for the drift and diffusion of the modelled SDE system.
arXiv Detail & Related papers (2021-05-24T00:53:10Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents.
We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings.
We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.