Related papers: Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture

Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture

URL: http://arxiv.org/abs/2112.08534v1
Date: Thu, 16 Dec 2021 00:04:12 GMT
Title: Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture
Authors: Kieran Wood, Sven Giegerich, Stephen Roberts, Stefan Zohren
Abstract summary: We introduce the Momentum Transformer, an attention-based architecture which outperforms the benchmarks. We observe remarkable structure in the attention patterns, with significant peaks of importance at momentum turning points. Through the addition of an interpretable variable selection network, we observe how CPD helps our model to move away from trading predominantly on daily returns data.
Score: 2.580765958706854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning architectures, specifically Deep Momentum Networks (DMNs) [1904.04912], have been found to be an effective approach to momentum and mean-reversion trading. However, some of the key challenges in recent years involve learning long-term dependencies, degradation of performance when considering returns net of transaction costs and adapting to new market regimes, notably during the SARS-CoV-2 crisis. Attention mechanisms, or Transformer-based architectures, are a solution to such challenges because they allow the network to focus on significant time steps in the past and longer-term patterns. We introduce the Momentum Transformer, an attention-based architecture which outperforms the benchmarks, and is inherently interpretable, providing us with greater insights into our deep learning trading strategy. Our model is an extension to the LSTM-based DMN, which directly outputs position sizing by optimising the network on a risk-adjusted performance metric, such as Sharpe ratio. We find an attention-LSTM hybrid Decoder-Only Temporal Fusion Transformer (TFT) style architecture is the best performing model. In terms of interpretability, we observe remarkable structure in the attention patterns, with significant peaks of importance at momentum turning points. The time series is thus segmented into regimes and the model tends to focus on previous time-steps in alike regimes. We find changepoint detection (CPD) [2105.13727], another technique for responding to regime change, can complement multi-headed attention, especially when we run CPD at multiple timescales. Through the addition of an interpretable variable selection network, we observe how CPD helps our model to move away from trading predominantly on daily returns data. We note that the model can intelligently switch between, and blend, classical strategies - basing its decision on patterns in the data.

Related papers

Enhanced Momentum with Momentum Transformers [0.46498278084317696]
We extend the ideas introduced in the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture to equities. Unlike conventional Long Short-Term Memory (LSTM) models, which operate sequentially and are optimized for processing local patterns, an attention mechanism equips our architecture with direct access to all prior time steps in the training window. This hybrid design, combining attention with an LSTM, enables the model to capture long-term dependencies, enhance performance in scenarios accounting for transaction costs, and seamlessly adapt to evolving market conditions.
arXiv Detail & Related papers (2024-12-17T04:11:30Z)
Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data [10.327160288730125]
We tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. We present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets.
arXiv Detail & Related papers (2024-12-13T20:26:35Z)
Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization [6.799413002613627]
Todyformer is a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers. We show that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks.
arXiv Detail & Related papers (2024-02-02T23:05:30Z)
Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services. Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels. In our research, we introduce a fresh perspective on urban mobility prediction. Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z)
Copula Variational LSTM for High-dimensional Cross-market Multivariate Dependence Modeling [46.75628526959982]
We make the first attempt to integrate variational sequential neural learning with copula-based dependence modeling. Our variational neural network WPVC-VLSTM models variational sequential dependence degrees and structures across time series. It outperforms benchmarks including linear models, volatility models, deep neural networks, and variational recurrent networks in cross-market portfolio forecasting.
arXiv Detail & Related papers (2023-05-09T08:19:08Z)
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT) Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z)
Stock Trend Prediction: A Semantic Segmentation Approach [3.718476964451589]
We present a novel approach to predict long-term daily stock price change trends with fully 2D-convolutional encoder-decoders. Our hierarchical structure of CNNs makes it capable of capturing both long and short-term temporal relationships effectively.
arXiv Detail & Related papers (2023-03-09T01:29:09Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities. In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed. This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z)
Detecting and adapting to crisis pattern with context based Deep Reinforcement Learning [6.224519494738852]
We present an innovative DRL framework consisting in two sub-networks fed respectively with portfolio strategies past performances and standard deviations as well as additional contextual features. Results on test set show this approach substantially over-performs traditional portfolio optimization methods like Markowitz and is able to detect and anticipate crisis like the current Covid one.
arXiv Detail & Related papers (2020-09-07T12:11:08Z)
On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time. We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
Deep Stock Predictions [58.720142291102135]
We consider the design of a trading strategy that performs portfolio optimization using Long Short Term Memory (LSTM) neural networks. We then customize the loss function used to train the LSTM to increase the profit earned. We find the LSTM model with the customized loss function to have an improved performance in the training bot over a regressive baseline such as ARIMA.
arXiv Detail & Related papers (2020-06-08T23:37:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.