A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books
- URL: http://arxiv.org/abs/2507.14960v1
- Date: Sun, 20 Jul 2025 13:42:36 GMT
- Title: A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books
- Authors: Ivan Letteri,
- Abstract summary: This study conducts a comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency limit order books (LOBs)<n>We evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours.<n>An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours. An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark. These findings underscore the effectiveness of outlier-driven strategies and provide insights into the trade-offs between model complexity, trade frequency, and performance. This study contributes to the growing corpus of research on cryptocurrency market microstructure by furnishing a rigorous benchmark of anomaly detection models and highlighting their potential for augmenting algorithmic trading and risk management.
Related papers
- Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data [2.5670390559986442]
Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce.<n>We systematically compare the performance of four supervised learning models on a large-scale, highly imbalanced online transaction dataset.
arXiv Detail & Related papers (2025-05-28T16:08:04Z) - The role of data partitioning on the performance of EEG-based deep learning models in supervised cross-subject analysis: a preliminary study [37.69303106863453]
Deep learning is advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns.<n>No comprehensive guidelines for proper data partitioning and cross-validation exist in the domain.<n>This paper thoroughly investigates the role of data partitioning and cross-validation in evaluating EEG deep learning models.
arXiv Detail & Related papers (2025-05-19T12:05:28Z) - Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.<n>We present a novel framework for identifying these tokens through rollout sampling.<n>We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z) - A Comprehensive Analysis of Machine Learning Models for Algorithmic Trading of Bitcoin [0.3069335774032178]
This study evaluates the performance of 41 machine learning models, including 21 classifiers and 20 regressors, in predicting Bitcoin prices for algorithmic trading.
Our comprehensive analysis reveals the strengths and limitations of each model, providing critical insights for developing effective trading strategies.
arXiv Detail & Related papers (2024-07-09T13:07:43Z) - Statistical arbitrage in multi-pair trading strategy based on graph clustering algorithms in US equities market [0.0]
The study seeks to develop an effective strategy based on the novel framework of statistical arbitrage based on graph clustering algorithms.
The study seeks to provide an integrated approach to optimal signal detection and risk management.
arXiv Detail & Related papers (2024-06-15T17:25:32Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Long Short-Term Memory Pattern Recognition in Currency Trading [0.0]
Wyckoff Phases is a framework devised by Richard D. Wyckoff in the early 20th century.
The research explores the phases of trading range and secondary test, elucidating their significance in understanding market dynamics.
By dissecting the intricacies of these phases, the study sheds light on the creation of liquidity through market structure.
The study highlights the transformative potential of AI-driven approaches in financial analysis and trading strategies.
arXiv Detail & Related papers (2024-02-23T12:59:49Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - DeepVol: Volatility Forecasting from High-Frequency Data with Dilated Causal Convolutions [53.37679435230207]
We propose DeepVol, a model based on Dilated Causal Convolutions that uses high-frequency data to forecast day-ahead volatility.
Our empirical results suggest that the proposed deep learning-based approach effectively learns global features from high-frequency data.
arXiv Detail & Related papers (2022-09-23T16:13:47Z) - Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics
in Limit-Order Book Markets [84.90242084523565]
Traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics.
By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention.
By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives.
arXiv Detail & Related papers (2022-03-07T18:59:54Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Empirical Study of Market Impact Conditional on Order-Flow Imbalance [0.0]
We show that for small signed order-flows, the price impact grows linearly with increase in the order-flow imbalance.
We have, further, implemented a machine learning algorithm to forecast market impact given a signed order-flow.
Our findings suggest that machine learning models can be used in estimation of financial variables.
arXiv Detail & Related papers (2020-04-17T14:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.