Generative AI for End-to-End Limit Order Book Modelling: A Token-Level
Autoregressive Generative Model of Message Flow Using a Deep State Space
Network
- URL: http://arxiv.org/abs/2309.00638v1
- Date: Wed, 23 Aug 2023 09:37:22 GMT
- Title: Generative AI for End-to-End Limit Order Book Modelling: A Token-Level
Autoregressive Generative Model of Message Flow Using a Deep State Space
Network
- Authors: Peer Nagy, Sascha Frey, Silvia Sapora, Kang Li, Anisoara Calinescu,
Stefan Zohren, Jakob Foerster
- Abstract summary: We propose an end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages.
Using NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens.
Results show promising performance in approximating the data distribution, as evidenced by low model perplexity.
- Score: 7.54290390842336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing a generative model of realistic order flow in financial markets is
a challenging open problem, with numerous applications for market participants.
Addressing this, we propose the first end-to-end autoregressive generative
model that generates tokenized limit order book (LOB) messages. These messages
are interpreted by a Jax-LOB simulator, which updates the LOB state. To handle
long sequences efficiently, the model employs simplified structured state-space
layers to process sequences of order book states and tokenized messages. Using
LOBSTER data of NASDAQ equity LOBs, we develop a custom tokenizer for message
data, converting groups of successive digits to tokens, similar to tokenization
in large language models. Out-of-sample results show promising performance in
approximating the data distribution, as evidenced by low model perplexity.
Furthermore, the mid-price returns calculated from the generated order flow
exhibit a significant correlation with the data, indicating impressive
conditional forecast performance. Due to the granularity of generated data, and
the accuracy of the model, it offers new application areas for future work
beyond forecasting, e.g. acting as a world model in high-frequency financial
reinforcement learning applications. Overall, our results invite the use and
extension of the model in the direction of autoregressive large financial
models for the generation of high-frequency financial data and we commit to
open-sourcing our code to facilitate future research.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - A Financial Time Series Denoiser Based on Diffusion Model [1.5193212081459284]
This paper introduces a novel approach utilizing the diffusion model as a denoiser for financial time series.
Trading signals derived from the denoised data yield more profitable trades with fewer transactions.
arXiv Detail & Related papers (2024-09-02T15:55:36Z) - Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a novel sandbox suite tailored for integrated data-model co-development.
This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models.
We also uncover fruitful insights gleaned from exhaustive benchmarks, shedding light on the critical interplay between data quality, diversity, and model behavior.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Towards a Foundation Purchasing Model: Pretrained Generative
Autoregression on Transaction Sequences [0.0]
We present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions.
We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions.
arXiv Detail & Related papers (2024-01-03T09:32:48Z) - DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting
Algorithms under Distributional Shift [16.326002979578686]
In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security.
Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks.
arXiv Detail & Related papers (2022-11-17T06:33:27Z) - The LOB Recreation Model: Predicting the Limit Order Book from TAQ
History Using an Ordinary Differential Equation Recurrent Neural Network [9.686252465354274]
We present the LOB recreation model, a first attempt from a deep learning perspective to recreate the top five price levels of the public limit order book (LOB) for small-tick stocks.
By the paradigm of transfer learning, the source model trained on one stock can be fine-tuned to enable application to other financial assets of the same class.
arXiv Detail & Related papers (2021-03-02T12:07:43Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Generating Realistic Stock Market Order Streams [18.86755130031027]
We propose an approach to generate realistic and high-fidelity stock market data based on generative adversarial networks (GANs)
Our Stock-GAN model employs a conditional Wasserstein GAN to capture history dependence of orders.
arXiv Detail & Related papers (2020-06-07T17:32:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.