Deep Learning for Market by Order Data
- URL: http://arxiv.org/abs/2102.08811v1
- Date: Wed, 17 Feb 2021 15:16:26 GMT
- Title: Deep Learning for Market by Order Data
- Authors: Zihao Zhang, Bryan Lim and Stefan Zohren
- Abstract summary: Market by order (MBO) data is a detailed feed of individual trade instructions for a given stock on an exchange.
MBO data is largely neglected by current academic literature which focuses primarily on limit order books (LOBs)
We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalisation scheme.
We show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy.
- Score: 7.274325784456261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Market by order (MBO) data - a detailed feed of individual trade instructions
for a given stock on an exchange - is arguably one of the most granular sources
of microstructure information. While limit order books (LOBs) are implicitly
derived from it, MBO data is largely neglected by current academic literature
which focuses primarily on LOB modelling. In this paper, we demonstrate the
utility of MBO data for forecasting high-frequency price movements, providing
an orthogonal source of information to LOB snapshots. We provide the first
predictive analysis on MBO data by carefully introducing the data structure and
presenting a specific normalisation scheme to consider level information in
order books and to allow model training with multiple instruments. Through
forecasting experiments using deep neural networks, we show that while
MBO-driven and LOB-driven models individually provide similar performance,
ensembles of the two can lead to improvements in forecasting accuracy --
indicating that MBO data is additive to LOB-based features.
Related papers
- F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks [74.21484375019334]
Training deep neural networks reliably requires access to large-scale datasets.
To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data required to train a satisfactory model is crucial.
This paper proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.
arXiv Detail & Related papers (2024-04-04T13:55:06Z) - MC-DBN: A Deep Belief Network-Based Model for Modality Completion [3.7020486533725605]
We propose a Modality Completion Deep Belief Network-Based Model (MC-DBN)
This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data.
It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model.
arXiv Detail & Related papers (2024-02-15T08:21:50Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting
Algorithms under Distributional Shift [16.326002979578686]
In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security.
Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks.
arXiv Detail & Related papers (2022-11-17T06:33:27Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - Calibrating Agent-based Models to Microdata with Graph Neural Networks [1.4911092205861822]
Calibrating agent-based models (ABMs) to data is among the most fundamental requirements to ensure the model fulfils its desired purpose.
We propose to learn parameter posteriors associated with granular microdata directly using temporal graph neural networks.
arXiv Detail & Related papers (2022-06-15T14:41:43Z) - The Limit Order Book Recreation Model (LOBRM): An Extended Analysis [2.0305676256390934]
The microstructure order book (LOB) depicts the fine-ahead-ahead demand and supply relationship for financial assets.
LOBRM was recently proposed to bridge this gap by synthesizing the LOB from trades and quotes (TAQ) data.
We extend the research on LOBRM and further validate its use in real-world application scenarios.
arXiv Detail & Related papers (2021-07-01T15:25:21Z) - The LOB Recreation Model: Predicting the Limit Order Book from TAQ
History Using an Ordinary Differential Equation Recurrent Neural Network [9.686252465354274]
We present the LOB recreation model, a first attempt from a deep learning perspective to recreate the top five price levels of the public limit order book (LOB) for small-tick stocks.
By the paradigm of transfer learning, the source model trained on one stock can be fine-tuned to enable application to other financial assets of the same class.
arXiv Detail & Related papers (2021-03-02T12:07:43Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.