Nowcasting the Financial Time Series with Streaming Data Analytics under
Apache Spark
- URL: http://arxiv.org/abs/2202.11820v1
- Date: Wed, 23 Feb 2022 23:17:01 GMT
- Title: Nowcasting the Financial Time Series with Streaming Data Analytics under
Apache Spark
- Authors: Mohammad Arafat Ali Khan, Chandra Bhushan, Vadlamani Ravi, Vangala
Sarveswara Rao and Shiva Shankar Orsu
- Abstract summary: This paper proposes nowcasting of high-frequency financial datasets in real-time with a 5-minute interval using the streaming analytics feature of Apache Spark.
The proposed 2 stage method consists of modelling chaos in the first stage and then using a sliding window approach for training with machine learning algorithms.
- Score: 3.219821135628767
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper proposes nowcasting of high-frequency financial datasets in
real-time with a 5-minute interval using the streaming analytics feature of
Apache Spark. The proposed 2 stage method consists of modelling chaos in the
first stage and then using a sliding window approach for training with machine
learning algorithms namely Lasso Regression, Ridge Regression, Generalised
Linear Model, Gradient Boosting Tree and Random Forest available in the MLLib
of Apache Spark in the second stage. For testing the effectiveness of the
proposed methodology, 3 different datasets, of which two are stock markets
namely National Stock Exchange & Bombay Stock Exchange, and finally One
Bitcoin-INR conversion dataset. For evaluating the proposed methodology, we
used metrics such as Symmetric Mean Absolute Percentage Error, Directional
Symmetry, and Theil U Coefficient. We tested the significance of each pair of
models using the Diebold Mariano (DM) test.
Related papers
- Asset price movement prediction using empirical mode decomposition and Gaussian mixture models [0.0]
We used five, two, and one year samples of hourly candle data for GameStop, Tesla, andRipple markets.
We collected several features based on a linear model and other classical features to predict the next hour's movement.
We evaluated the performance of various machine learning models, including Random Forests (RF) and XGBoost, in classifying market movements.
arXiv Detail & Related papers (2025-03-26T16:12:11Z) - An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data.
We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z) - LC-SVD-DLinear: A low-cost physics-based hybrid machine learning model for data forecasting using sparse measurements [2.519319150166215]
This article introduces a novel methodology that integrates singular value decomposition (SVD) with a shallow linear neural network for forecasting high resolution fluid mechanics data.
We present a variant of the method, LC-HOSVD-DLinear, which combines a low-cost version of the high-order singular value decomposition algorithm with the DLinear network, designed for high-order data.
arXiv Detail & Related papers (2024-11-26T13:43:50Z) - Bayesian Circular Regression with von Mises Quasi-Processes [57.88921637944379]
In this work we explore a family of expressive and interpretable distributions over circle-valued random functions.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Gibbs sampling.
We present experiments applying this model to the prediction of wind directions and the percentage of the running gait cycle as a function of joint angles.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data [9.913418444556486]
We show how iterative methods can be used to reduce the computational costs for calculating likelihoods, gradients, and predictive distributions with FSAs.
We also present a novel, accurate, and fast way to calculate predictive variances relying on estimations and iterative methods.
All methods are implemented in a free C++ software library with high-level Python and R packages.
arXiv Detail & Related papers (2024-05-23T12:25:22Z) - An Efficient Data Analysis Method for Big Data using Multiple-Model
Linear Regression [4.085654010023149]
This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR)
The proposed data analysis method is shown to be more efficient and flexible than other regression based methods.
arXiv Detail & Related papers (2023-08-24T10:20:15Z) - DF2: Distribution-Free Decision-Focused Learning [53.2476224456902]
Decision-focused learning (DFL) has recently emerged as a powerful approach for predictthen-optimize problems.
Existing end-to-end DFL methods are hindered by three significant bottlenecks: model error, sample average approximation error, and distribution-based parameterization of the expected objective.
We present DF2 -- the first textit-free decision-focused learning method explicitly designed to address these three bottlenecks.
arXiv Detail & Related papers (2023-08-11T00:44:46Z) - Probabilistic Solar Proxy Forecasting with Neural Network Ensembles [0.0]
Space Environment Technologies (SET) uses a linear algorithm to forecast $F_10.7 cm$.
We introduce methods using neural network ensembles with multi-layer perceptrons (MLPs) and long-short term memory (LSTMs) to improve on SET predictions.
arXiv Detail & Related papers (2023-06-03T18:22:01Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Overlap-guided Gaussian Mixture Models for Point Cloud Registration [61.250516170418784]
Probabilistic 3D point cloud registration methods have shown competitive performance in overcoming noise, outliers, and density variations.
This paper proposes a novel overlap-guided probabilistic registration approach that computes the optimal transformation from matched Gaussian Mixture Model (GMM) parameters.
arXiv Detail & Related papers (2022-10-17T08:02:33Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - A Class of Two-Timescale Stochastic EM Algorithms for Nonconvex Latent
Variable Models [21.13011760066456]
The Expectation-Maximization (EM) algorithm is a popular choice for learning variable models.
In this paper, we propose a general class of methods called Two-Time Methods.
arXiv Detail & Related papers (2022-03-18T22:46:34Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Uncertainty Inspired RGB-D Saliency Detection [70.50583438784571]
We propose the first framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection.
Results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps.
arXiv Detail & Related papers (2020-09-07T13:01:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.