DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting
Algorithms under Distributional Shift
- URL: http://arxiv.org/abs/2211.11513v1
- Date: Thu, 17 Nov 2022 06:33:27 GMT
- Title: DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting
Algorithms under Distributional Shift
- Authors: Defu Cao, Yousef El-Laham, Loc Trinh, Svitlana Vyetrenko, Yan Liu
- Abstract summary: In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security.
Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks.
- Score: 16.326002979578686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In electronic trading markets, limit order books (LOBs) provide information
about pending buy/sell orders at various price levels for a given security.
Recently, there has been a growing interest in using LOB data for resolving
downstream machine learning tasks (e.g., forecasting). However, dealing with
out-of-distribution (OOD) LOB data is challenging since distributional shifts
are unlabeled in current publicly available LOB datasets. Therefore, it is
critical to build a synthetic LOB dataset with labeled OOD samples serving as a
testbed for developing models that generalize well to unseen scenarios. In this
work, we utilize a multi-agent market simulator to build a synthetic LOB
dataset, named DSLOB, with and without market stress scenarios, which allows
for the design of controlled distributional shift benchmarking. Using the
proposed synthetic dataset, we provide a holistic analysis on the forecasting
performance of three different state-of-the-art forecasting methods. Our
results reflect the need for increased researcher efforts to develop algorithms
with robustness to distributional shifts in high-frequency time series data.
Related papers
- F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks [74.21484375019334]
Training deep neural networks reliably requires access to large-scale datasets.
To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data required to train a satisfactory model is crucial.
This paper proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.
arXiv Detail & Related papers (2024-04-04T13:55:06Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Generative AI for End-to-End Limit Order Book Modelling: A Token-Level
Autoregressive Generative Model of Message Flow Using a Deep State Space
Network [7.54290390842336]
We propose an end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages.
Using NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens.
Results show promising performance in approximating the data distribution, as evidenced by low model perplexity.
arXiv Detail & Related papers (2023-08-23T09:37:22Z) - Approaching sales forecasting using recurrent neural networks and
transformers [57.43518732385863]
We develop three alternatives to tackle the problem of forecasting the customer sales at day/store/item level using deep learning techniques.
Our empirical results show how good performance can be achieved by using a simple sequence to sequence architecture with minimal data preprocessing effort.
The proposed solution achieves a RMSLE of around 0.54, which is competitive with other more specific solutions to the problem proposed in the Kaggle competition.
arXiv Detail & Related papers (2022-04-16T12:03:52Z) - The Limit Order Book Recreation Model (LOBRM): An Extended Analysis [2.0305676256390934]
The microstructure order book (LOB) depicts the fine-ahead-ahead demand and supply relationship for financial assets.
LOBRM was recently proposed to bridge this gap by synthesizing the LOB from trades and quotes (TAQ) data.
We extend the research on LOBRM and further validate its use in real-world application scenarios.
arXiv Detail & Related papers (2021-07-01T15:25:21Z) - The LOB Recreation Model: Predicting the Limit Order Book from TAQ
History Using an Ordinary Differential Equation Recurrent Neural Network [9.686252465354274]
We present the LOB recreation model, a first attempt from a deep learning perspective to recreate the top five price levels of the public limit order book (LOB) for small-tick stocks.
By the paradigm of transfer learning, the source model trained on one stock can be fine-tuned to enable application to other financial assets of the same class.
arXiv Detail & Related papers (2021-03-02T12:07:43Z) - Deep Learning for Market by Order Data [7.274325784456261]
Market by order (MBO) data is a detailed feed of individual trade instructions for a given stock on an exchange.
MBO data is largely neglected by current academic literature which focuses primarily on limit order books (LOBs)
We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalisation scheme.
We show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy.
arXiv Detail & Related papers (2021-02-17T15:16:26Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Generating Realistic Stock Market Order Streams [18.86755130031027]
We propose an approach to generate realistic and high-fidelity stock market data based on generative adversarial networks (GANs)
Our Stock-GAN model employs a conditional Wasserstein GAN to capture history dependence of orders.
arXiv Detail & Related papers (2020-06-07T17:32:42Z) - Low-Budget Label Query through Domain Alignment Enforcement [48.06803561387064]
We tackle a new problem named low-budget label query.
We first improve an Unsupervised Domain Adaptation (UDA) method to better align source and target domains.
We then propose a simple yet effective selection method based on uniform sampling of the prediction consistency distribution.
arXiv Detail & Related papers (2020-01-01T16:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.