LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data
- URL: http://arxiv.org/abs/2502.09172v1
- Date: Thu, 13 Feb 2025 10:56:58 GMT
- Title: LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data
- Authors: Peer Nagy, Sascha Frey, Kang Li, Bidipta Sarkar, Svitlana Vyetrenko, Stefan Zohren, Ani Calinescu, Jakob Foerster,
- Abstract summary: We present a benchmark designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB)
Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data.
The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times.
- Score: 7.317765812144531
- License:
- Abstract: While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.
Related papers
- A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers [70.20477771578824]
Existing approaches to event prediction include time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance.
We propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective.
Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases.
arXiv Detail & Related papers (2024-10-14T15:59:16Z) - EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event sequences are common data structures in various real-world domains such as healthcare, finance, and user interaction logs.
Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences.
We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Generative AI for End-to-End Limit Order Book Modelling: A Token-Level
Autoregressive Generative Model of Message Flow Using a Deep State Space
Network [7.54290390842336]
We propose an end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages.
Using NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens.
Results show promising performance in approximating the data distribution, as evidenced by low model perplexity.
arXiv Detail & Related papers (2023-08-23T09:37:22Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Neural Stochastic Agent-Based Limit Order Book Simulation: A Hybrid
Methodology [6.09170287691728]
Modern financial exchanges use an electronic limit order book (LOB) to store bid and ask orders for a specific financial asset.
We propose a novel hybrid LOB simulation paradigm characterised by: (1) representing the aggregation of market events' logic by a neural background trader that is pre-trained on historical LOB data through a neural point model; and (2) embedding the background trader in a multi-agent simulation with other trading agents.
We show that the stylised facts remain and we demonstrate order flow impact and financial herding behaviours that are in accordance with empirical observations of real markets.
arXiv Detail & Related papers (2023-02-28T20:53:39Z) - DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting
Algorithms under Distributional Shift [16.326002979578686]
In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security.
Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks.
arXiv Detail & Related papers (2022-11-17T06:33:27Z) - Leveraging Instance Features for Label Aggregation in Programmatic Weak
Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently.
The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions.
Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z) - The Limit Order Book Recreation Model (LOBRM): An Extended Analysis [2.0305676256390934]
The microstructure order book (LOB) depicts the fine-ahead-ahead demand and supply relationship for financial assets.
LOBRM was recently proposed to bridge this gap by synthesizing the LOB from trades and quotes (TAQ) data.
We extend the research on LOBRM and further validate its use in real-world application scenarios.
arXiv Detail & Related papers (2021-07-01T15:25:21Z) - Feature Quantization Improves GAN Training [126.02828112121874]
Feature Quantization (FQ) for the discriminator embeds both true and fake data samples into a shared discrete space.
Our method can be easily plugged into existing GAN models, with little computational overhead in training.
arXiv Detail & Related papers (2020-04-05T04:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.