Related papers: Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units

Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units

URL: http://arxiv.org/abs/2105.10430v1
Date: Fri, 21 May 2021 16:06:41 GMT
Title: Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units
Authors: Zihao Zhang, Stefan Zohren
Abstract summary: We design multi-horizon forecasting models for limit order book (LOB) data by using deep learning techniques. Our methods achieve comparable performance to state-of-art algorithms at short prediction horizons.
Score: 3.04585143845864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design multi-horizon forecasting models for limit order book (LOB) data by using deep learning techniques. Unlike standard structures where a single prediction is made, we adopt encoder-decoder models with sequence-to-sequence and Attention mechanisms, to generate a forecasting path. Our methods achieve comparable performance to state-of-art algorithms at short prediction horizons. Importantly, they outperform when generating predictions over long horizons by leveraging the multi-horizon setup. Given that encoder-decoder models rely on recurrent neural layers, they generally suffer from a slow training process. To remedy this, we experiment with utilising novel hardware, so-called Intelligent Processing Units (IPUs) produced by Graphcore. IPUs are specifically designed for machine intelligence workload with the aim to speed up the computation process. We show that in our setup this leads to significantly faster training times when compared to training models with GPUs.

Related papers

SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting [12.317709090608837]
We present SpecEE, a fast inference engine with speculative early exiting. SpecEE achieves 2.25x and 2.43x speedup with Llama2-7B on cloud and PC scenarios respectively.
arXiv Detail & Related papers (2025-04-11T02:38:53Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Benchmarking GPU and TPU Performance with Graph Neural Networks [0.0]
This work analyzes and compares the GPU and TPU performance training a Graph Neural Network (GNN) developed to solve a real-life pattern recognition problem. Characterizing the new class of models acting on sparse data may prove helpful in optimizing the design of deep learning libraries and future AI accelerators.
arXiv Detail & Related papers (2022-10-21T21:03:40Z)
Accelerating Machine Learning Training Time for Limit Order Book Prediction [0.0]
Financial firms are interested in simulation to discover whether a given algorithm involving financial machine learning will operate profitably. For this task, hardware acceleration is expected to speed up the time required for the financial machine learning researcher to obtain the results. A published Limit Order Book algorithm for predicting stock market direction is our subject, and the machine learning training process can be time-intensive. In the studied configuration, this leads to significantly faster training time allowing more efficient and extensive model development.
arXiv Detail & Related papers (2022-06-17T22:52:56Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
A Survey of Deep Learning Techniques for Dynamic Branch Prediction [0.0]
Branch prediction is an architectural feature that speeds up the execution of branch instruction on pipeline processors. Recent advancements of Deep Learning (DL) in the post Moore's Law era is accelerating areas of automated chip design.
arXiv Detail & Related papers (2021-12-30T03:44:27Z)
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms. We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z)
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale [11.121380180647769]
We share in this paper our search strategies to adapt reference recommendation models to low-precision hardware. We also discuss the design and development of tool chain so as to maintain our models' accuracy throughout their lifespan. We believe these lessons from the trenches promote better co-design between hardware architecture and software engineering.
arXiv Detail & Related papers (2021-05-26T16:42:33Z)
Efficient Algorithms for Device Placement of DNN Graph Operators [12.871398348743591]
Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. Recent work has shown that significant gains can be obtained with model parallelism, i.e., partitioning a neural network's computational graph onto multiple devices. In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings.
arXiv Detail & Related papers (2020-06-29T22:45:01Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.