Financial markets are highly complex and volatile; thus, learning about such
markets for the sake of making predictions is vital to make early alerts about
crashes and subsequent recoveries. People have been using learning tools from
diverse fields such as financial mathematics and machine learning in the
attempt of making trustworthy predictions on such markets. However, the
accuracy of such techniques had not been adequate until artificial neural
network (ANN) frameworks were developed. Moreover, making accurate real-time
predictions of financial time series is highly subjective to the ANN
architecture in use and the procedure of training it. Long short-term memory
(LSTM) is a member of the recurrent neural network family which has been widely
utilized for time series predictions. Especially, we train two LSTMs with a
known length, say $T$ time steps, of previous data and predict only one time
step ahead. At each iteration, while one LSTM is employed to find the best
number of epochs, the second LSTM is trained only for the best number of epochs
to make predictions. We treat the current prediction as in the training set for
the next prediction and train the same LSTM. While classic ways of training
result in more error when the predictions are made further away in the test
period, our approach is capable of maintaining a superior accuracy as training
increases when it proceeds through the testing period. The forecasting accuracy
of our approach is validated using three time series from each of the three
diverse financial markets: stock, cryptocurrency, and commodity. The results
are compared with those of an extended Kalman filter, an autoregressive model,
and an autoregressive integrated moving average model.
Real-time Forecasting of Time Series in Financial Markets Using
金融市場における時系列のリアルタイム予測
0.86
Sequentially Trained Many-to-one LSTMs
連続訓練多対一LSTM
0.56
Kelum Gajamannagea,∗, Yonggi Parka
Kelum Gajamannagea,∗, Yonggi Parka
0.32
aDepartment of Mathematics and Statistics, Texas A&M University - Corpus Christ, 6300 Ocean Dr.
adepartment of mathematics and statistics, texas a&m university - corpus christ, 6300 ocean dr. (英語)
0.44
, Corpus Christi, TX
コーパスクリスティ TX
0.52
78412, USA
78412年(延暦13年)
0.33
Abstract Financial markets are highly complex and volatile; thus, learning about such markets for the sake of making predictions is vital to make early alerts about crashes and subsequent recoveries.
People have been using learning tools from diverse fields such as financial mathematics and machine learning in the attempt of making trustworthy predictions on such markets.
Moreover, making accurate realtime predictions of financial time series is highly subjective to the ANN architecture in use and the procedure of training it.
At each iteration, while one LSTM is employed to find the best number of epochs, the second LSTM is trained only for the best number of epochs to make predictions.
We treat the current prediction as in the training set for the next prediction and train the same LSTM.
我々は、現在の予測を次の予測のためのトレーニングセットとして扱い、同じLSTMを訓練する。
0.68
While classic ways of training result in more error when the predictions are made further away in the test period, our approach is capable of maintaining a superior accuracy as training increases when it proceeds through the testing period.
The forecasting accuracy of our approach is validated using three time series from each of the three diverse financial markets: stock, cryptocurrency, and commodity.
1. Introduction Financial markets refer broadly to any marketplace that entitles the trading of securities, commodities, and other fungibles, and the financial security market includes stock market, cryptocurrency market, etc.
A cryptocurrency market exchanges digital or virtual currencies between peers without the need for a third party such as a bank (Squarepants, 2022), but a commodity market trades raw materials such as gold and oil rather than manufactured products.
These markets are both highly complex and volatile due to diverse economical, social, and political conditions (Qiu et al , 2020).
これらの市場は経済的、社会的、政治的条件の多様さから非常に複雑で不安定である(Qiu et al , 2020)。
0.68
Learning such markets for the sake of making predictions is vital because that aids market analysts to make early alerts about crashes and subsequent recoveries so that the investors can either obtain better precautions for future crashes or gain more profit under future recoveries.
メールアドレス:kelum.gajamannage@t amucc.edu (Kelum Gajamannage ), yonggi.park@tamucc.e du (Yonggi Park)
0.42
Preprint submitted to Elsevier
Elsevierに提出されたプレプリント
0.57
May 11, 2022
2022年5月11日
0.71
英語(論文から抽出)
日本語訳
スコア
inefficient to rely only on a trader’s personal experience and intuition for the analysis and judgment of such markets, traders need smart trading recommendations derived from scientific research methods.
The classical methods of making predictions on time series data are mostly linear statistical approaches such as linear parametric autoregressive (AR), moving average (MA), and autoregressive integrated moving average (ARIMA) (Zhao et al , 2018) where they assume linear relationships between the current output and previous outputs.
時系列データで予測を行う古典的な方法は、線形パラメトリック自己回帰(ar)、移動平均(ma)、自己回帰的統合移動平均(arima)(zhao et al、2018年)といった線形統計アプローチであり、現在の出力と以前の出力の線形関係を仮定している。
0.79
Thus, they often do not capture non-linear relationships in the data and cannot cope with certain complex time series.
したがって、それらはしばしばデータの非線形関係を捉えず、ある複雑な時系列に対処できない。
0.83
Because financial time series are nonstationary, nonlinear, and contaminated with high noise (Bontempi et al , 2013), traditional statistical models have some limitations in predicting financial time series with high precision.
金融時系列は非定常、非線形、高ノイズで汚染される(bontempi et al , 2013)ため、従来の統計モデルは高精度な金融時系列の予測にいくつかの制限がある。
0.72
Purely data-driven approaches such as Artificial Neural Networks (ANNs) are adopted to forecast nonlinear and nonstationary time series data with both high efficiency and better accuracy and have become a popular predictor due to adaptive self-learning (Gajamannage et al , 2021).
ANN(Artificial Neural Networks)のような純粋にデータ駆動型アプローチは、非線形および非定常時系列データを高い効率と精度で予測するために採用され、適応型自己学習(Gajamannage et al, 2021)により一般的な予測器となった。
0.80
Recurrent Neural Networks (RNNs) are powerful and robust types of ANNs that belong to the most promising algorithms in use because of their internal memory (Park et al , 2022).
リカレントニューラルネットワーク(recurrent neural network, rnns)は、内部メモリのために使用されている最も有望なアルゴリズムに属する、強力で堅牢なアンタイプである(park et al, 2022)。
0.65
This internal memory remembers its inputs and helps RNN to find solutions for a vast variety of problems (Ma & Principe, 2018).
RNN is optimized with respect to its weights to fit the training data by adopting a technique called backpropagation that requires the gradient of the RNN.
However, the gradient of RNN may vanish and explode during the optimization routing which hampers RNN’s learning ability of long data sequences (Allen-Zhu et al , 2019).
しかし、RNNの勾配は、RNNの長いデータシーケンスの学習能力を損なう最適化ルーティング中に消えて爆発する可能性がある(Allen-Zhu et al , 2019)。
0.70
As a solution to these two problems (Le & Zuidema, 2016), the LSTM architecture (Hochreiter & Schmidhuber, 1997), which is a special type of RNN, is often used.
LSTM performs faithful learning in applications such as speech recognition (Tian et al , 2017; Kim et al , 2017) and text processing (Shih et al , 2018; Simistira et al , 2015).
LSTMは音声認識(Tian et al , 2017; Kim et al , 2017)やテキスト処理(Shih et al , 2018; Simistira et al , 2015)などのアプリケーションで忠実な学習を行う。
0.85
Moreover, LSTM is also suitable for complex data sequences such as stock time series extracted from financial markets because it has internal memory, has capability of customization, and is free from gradient-related issues.
For that, we train this LSTM with a known length, sat T time steps, of previous data while setting the loss function to be the mean square error between labels and predictions.
This real-time LSTM model is capable of incorporating every new future observation of the time series into the ongoing training process to make predictions.
Since we use a sequence of observed time series to predict only one time step ahead, the prediction accuracy increases significantly.
観測された時系列の系列を用いて1回の予測のみを行うため,予測精度は大幅に向上する。
0.82
Moreover, the T − 1 previous observations along with the current prediction predict for the next time step, and so the prediction error associated with the current prediction is further minimized as it runs through iterations.
While classic ways of training result in more error when the predictions are made further away in the test period, our approach is capable of maintaining a superior accuracy as training increases when it proceeds through the testing period.
In Sec.1, first, we present the notion of real-time time series predictions.
Sec.1では、まずリアルタイム時系列予測の概念を示す。
0.81
Then, we provide mathematical formulation of many-to-one LSTM architecture for sequential training.
次に、逐次学習のための多対一LSTMアーキテクチャの数学的定式化を行う。
0.55
Finally, for the state-of-the-art time series prediction methods, we provide the formulation of one nonlinear statistical approach called extended Kalman filters (EKF), and two linear statistical approaches called AR and ARIMA.
Sec. 3 provides a detailed analysis of the performance of our LSTM architecture against that of EKF, AR, and ARIMA using three financial stocks (Apple, Microsoft, Google), three cryptocurrencies (Bit-
For a fixed-length input sequential data, the model is set to predict only one future time step at an iteration where the process runs until the required length of the prediction is performed.
This real-time prediction approach is capable of incorporating every new data point of the time series into the ongoing training process to make predictions for the next time step.
Let, the current observed time series is [x(1), . . . , x(T )] for some
現在の観測時系列は、ある場合には [x(1), . , x(T )] とする。
0.73
T , the unobserved future portion of the time series is (cid:2)x(T +1), . . . , x(T +N )(cid:3) for some N < T , and the time x(T ) = F(cid:16) where X (1) =(cid:2)x(1), . . . , x(T−1)(cid:3).
T , 時系列の未観測未来部分は (cid:2)x(T +1), . . , x(T +N )(cid:3) for some N < T , and the time x(T ) = F(cid:16) where X (1) =(cid:2)x(1), . . , x(T−1)(cid:3) である。
0.94
Then, we predict for the time step (T + 1), denoted by ˆx(T +1), as ˆx(T +1) = F(cid:16) X (2)(cid:17) where X (2) = (cid:2)x(2), . . . , x(T )(cid:3).
In the second iteration, we train the same model F with ˆx(T +1) = F(cid:16) X (2)(cid:17) where X (2) = (cid:2)x(2), . . . , x(T )(cid:3) and predict for the time step (T + 2), X (3)(cid:17) denoted by ˆx(T +2), as ˆx(T +2) = F(cid:16)
第2のイテレーションでは、同じモデル F を X (2) = (cid:2)x(2), . , x(T )(cid:3) で訓練し、時間ステップ (T + 2), X (3)(cid:17) を .x(T +2) = F(cid:16) として予測する。 訳抜け防止モード: 第2の反復では、同じモデル f を f(cid:16 ) x ( 2)(cid:17 ) で訓練し、ここで x ( 2 ) = ( cid:2)x(2 ), ... とする。 x(t ) (cid:3 ) であり、時間ステップ (t + 2 ) を予測する。 x ( 3)(cid:17 ) と表記され、x(t +2 ) = f(cid:16 ) と表記される。
0.82
series model is F, see Fig 1.
シリーズモデルは f で、図 1 を参照。
0.85
For the first iteration, we train the time series forecasting model with the
最初のイテレーションでは、時系列予測モデルをトレーニングします。
0.66
(cid:104) x(3), . . . , ˆx(T +1)(cid:105)
(cid:104) x(3), . . . . , .x(T +1)(cid:105)
0.45
. We keep on doing this process
. 私たちはこのプロセスを続けます
0.61
X (1)(cid:17)
X (1) (cid:17)
0.47
where X (3) =
ここで x (3) =
0.80
until the predictions are performed for all the N time steps .
予測が全N時間ステップで実行されるまで。
0.61
2.2. Many-to-one LSTM architecture with sequential training
2.2. 逐次学習による多対一LSTMアーキテクチャ
0.49
Since we make predictions only for one time step ahead at a time for an input time series, the LSTM architecture implemented here is the many-to-one type, see Fig 2
An LSTM consists of a series of nonlinear recurrent modules, denoted as M (t) for t = 1, . . . , T and j = 1, . . . , N in Fig 2, where each module processes data related to one time step.
LSTMは、t = 1, . , T and j = 1, . . . . . . , N in Fig 2 に対して M (t) と表記される一連の非線形再帰加群で構成され、各モジュールは1つの時間ステップに関連するデータを処理する。
0.87
LSTM introduces a memory cell, a special type of the hidden state, that has the same shape as the hidden state which is engineered to record additional information.
Each recurrent module in an LSTM filters information through four hidden layers where three of them are gates, namely, forgotten gate, input gate, and output gate, and the other is called the cell state that maintains and updates long-term memory, see 2(b).
j The forgotten gate resets the content of the memory cell by deciding what information should be forgotten or retained.
j 忘れられたゲートは、どの情報を忘れるか、保持すべきかを判断してメモリセルの内容を再セットする。
0.54
This gate produces a value between zero and one where zero means completely forgetting the previous hidden state and one means completely retaining that.
このゲートは 0 と 1 の間の値を生成し、0 は以前の隠れ状態を完全に忘れることを意味する。
0.78
Information from the previous hidden state, i.e., h(t−1), and the information from the current input, i.e., x(t), is passed through the sigmoid function, denoted as σ, according to
, (1) where Wf and bf are weighting matrix and biased vector, respectively.
, (1) wf と bf はそれぞれ重み付き行列とバイアス付きベクトルである。
0.54
The input gate consisting of two components decides what new information is to be stored in the cell state.
2つの成分からなる入力ゲートは、セル状態に格納される新しい情報を決定する。
0.79
The first component is a sigmoid layer that decides which values to be updated based on the previous hidden state and the information from the current input such that
= ˆx(t) w(t) y(t) v(t) αi, 1 ≤ i ≤ p (t) βi, 1 ≤ i ≤ q a(t) c bi, bf , bc f h Li ∆D(·) Q(t), R(t), P (t) Jf , Jh Wi, Wf , Wo
= シュックス(t) w(t) y(t) v(t) αi, 1 ≤ i ≤ p >(t) βi, 1 ≤ i ≤ q a(t) c bi, bf , bc f h Li >D(·) Q(t), R(t), P(t) Jf , Jh Wi, Wf , Wo 訳抜け防止モード: = x(t ) w(t ) y(t ) v(t ) αi, 1 ≤ i ≤ p s(t ) βi である。 1 ≤ i ≤ q a(t ) c bi, bf, bc f h li ( · ) q(t ), r(t ), p (t ) jf , jh wi , wf , wo
0.60
Description Index for time steps Length of the training period Forecasting length Number of epochs Number of stacked LSTMs Training loss at the l-th epoch of the t-th iteration Sigmoid function in LSTM Order of the AR model Number of past innovations in MA model Relative root mean square error The observation at the t-th time step where 1 ≤ t ≤ T t-th input training window
時間ステップの記述指標 トレーニング期間の長さ 予測時間 エポック数 積み重ねたLSTM数 LSTMのl-エポックにおける訓練損失 LSTMにおけるSigmoid関数 ARモデルの順 MAモデルにおける過去のイノベーションの数 相対根平均二乗誤差 t-th時間ステップにおける観測: 1 ≤ t ≤ T t-th入力トレーニングウィンドウ
0.82
The prediction at the t-th time step where T < t ≤ T + N White Gaussian noise vector with zero mean in EKF Observation vector at the t-th time step in EKF White Gaussian noise vector with zero mean in EKF Parameters of AR model White Gaussian noise vector with zero mean in AR model Parameters of MA model t-th past innovation of MA model Biased vector in ARIMA Bias vectors in LSTM System dynamics in EKF Measurement function in EKF i-th level lag operator D-th differential time series Covariance matrices of w(t), v(t), x(t), respectively, in EKF Jacobian matrices of f (·), h(·), respectively, in EKF Weight matrices in LSTM
The prediction at the t-th time step where T < t ≤ T + N White Gaussian noise vector with zero mean in EKF Observation vector at the t-th time step in EKF White Gaussian noise vector with zero mean in EKF Parameters of AR model White Gaussian noise vector with zero mean in AR model Parameters of MA model t-th past innovation of MA model Biased vector in ARIMA Bias vectors in LSTM System dynamics in EKF Measurement function in EKF i-th level lag operator D-th differential time series Covariance matrices of w(t), v(t), x(t), respectively, in EKF Jacobian matrices of f (·), h(·), respectively, in EKF Weight matrices in LSTM 訳抜け防止モード: t < t ≤ t である t - th 時間ステップにおける予測 ma モデルのモデルパラメータ ma モデル t - th のモデルパラメータ ma モデルのバイアスベクトルにおける ma モデルバイアスベクトルの過去のイノベーション ekf i における ekf 計測関数における ekf 計測関数における lstm システムダイナミクス ekf i - th レベルラグ作用素 d - th 微分時間列共分散行列 w(t ) の ekf パラメータにおける ekf 観測ベクトルにおける ekf 平均値が 0 である ekf 観測ベクトルにおける ekf 平均値が 0 である ガウスノイズベクトル v(t ), x(t ) は、それぞれ f ( · ) の ekf ヤコビ行列における。 lstmのekf重量行列におけるh(·)
0.77
Table 1: Notations used in this paper and their descriptions
表1:本稿における表記とその記述
0.66
Abbreviations Description LSTM KF EKF AR MA ARMA ARIMA
略語 LSTM KF EKF AR MA ARMA ARIMA
0.32
Long Short-Term Memory Kalman Filter Extended Kalman Filter AutoRegressive Moving Average AutoRegressive Moving Average AutoRegressive Integrated Moving Average
Figure 1: Real-time time series prediction scheme where the currently observed time series,(cid:2)x(1), . . . , x(T )(cid:3) for some T , is shown by blue color.
図1: 現在観測されている時系列(cid:2)x(1), . . , x(T )(cid:3) をある T に対してブルー色で示すリアルタイム時系列予測スキーム。
0.84
The unobserved future time series,(cid:2)x(T +1), . . . , x(T +N )(cid:3) for some N < T , X (1)(cid:17) train the time series forecasting model with the x(T ) = F(cid:16) where X (1) =(cid:2)x(1), . . . , x(T−1)(cid:3).
N < T , X (1)(cid:17) は x(T ) = F(cid:16) で時系列予測モデルを訓練するが、X (1) =(cid:2)x(1), . . , x(T−1)(cid:3) である。 訳抜け防止モード: 観測されていない将来の時系列、(cid:2)x(T +1 ) . x(T + N ) ( cid:3 ) for some N < T, X ( 1)(cid:17 ) は x(T ) = F(cid:16 ) ここで X ( 1 ) = ( cid:2)x(1 ),... で時系列予測モデルを訓練する。 x(T−1)(cid:3 )。
0.92
Then, we predict for the time step (T + 1) as ˆx(T +1) = F(cid:16) X (2)(cid:17) where X (2) = (cid:2)x(2), . . . , x(T )(cid:3).
In the second X (2)(cid:17) iteration, we train the same model with ˆx(T +1) = F(cid:16) where X (2) =(cid:2)x(2), . . . , x(T )(cid:3) and predict for the time step (T + 2) as ˆx(T +2) = F(cid:16) (cid:104) x(3), . . . , ˆx(T +1)(cid:105)
where Wi and bi are weighting matrix and biased vector, respectively.
wi と bi はそれぞれ重み付き行列とバイアス付きベクトルである。
0.72
The next component is a tanh layer that creates a vector of new candidate values, ˜c(t), based on the previous hidden state and the information from the current input as
(cid:17) where Wc and bc are weighting matrix and biased vector, respectively.
(cid:17) wc と bc はそれぞれ重み付き行列とバイアス付きベクトルである。
0.58
ANN is X = (cid:2)x(1), . . . , x(t), . . . , x(T−1)(cid:3) and output from that is ˆx(T ).
ann は x = (cid:2)x(1), . . , x(t), . . . . , x(t−1)(cid:3) で、そこから出力される。 訳抜け防止モード: ANN は X = ( cid:2)x(1 ) である。 x(t ), . . , x(T−1)(cid:3 ) and 出力は、x(T) である。
0.86
Figure 2: K-stacked LSTMs for many-to-one forecasting of a single feature time series where each LSTM is a collection of recurrent modules, denoted as M ’s.
図2: k-stacked lstms 単一機能時系列の多対一予測では、各lstm がリカレントモジュールの集合であり、m と表記されます。 訳抜け防止モード: 図2 - K - 多くのLSTMを積み重ねた - 単一の特徴時系列の予測 各LSTM は M の s と表される反復モジュールの集合である。
0.90
(a) the left figure shows a folded version of the artificial neural network (ANN) whereas the right figure shows its unfolded version of it.
(b) Each M filters information through four hidden layers where three of them are gates, namely, forgotten, input, and output, and the other is called the cell state.
(b)各mは4つの隠れた層を通して情報をフィルタリングし、3つがゲート、すなわち、忘れられた、入力された、出力され、もう1つはセル状態と呼ばれる。 訳抜け防止モード: b) 各Mは3つのゲートが隠された4つの層を通して情報をフィルタリングする。 つまり、忘れられ、入力され、出力され、 もう1つは細胞状態です
0.77
The forgotten gate resets the content of the memory cell, the input gate decides what new information is stored in the memory cell, the cell state stores long-term information in the memory, and the output gate sends out a filtered version of the memory cell’s stored information from the M .
Cell state updates the LSTM’s memory with new long-term information.
セル状態は新しい長期情報でLSTMのメモリを更新する。
0.78
For that, first, it multiplies point wisely the old cell state c(t−1) by the forgetting state f (t), i.e., f (t) (cid:12) c(t−1), to assure that the information retains from the old cell state is what is allowed by the forgetting state.
(4) as the information from the current input state which is found relevant by the ANN.
(4) ANNが関連付けている現在の入力状態からの情報として。
0.55
The output gate determines the value of the next hidden state with the information from the current cell state, current input state, and previous hidden state.
First, a sigmoid layer decides how much of the current input and the previous hidden state are going to output.
まず、sgmoid層は、現在の入力と以前の隠れ状態のどれが出力されるかを決定する。
0.76
Then, the current cell state is passed through the tanh layer to
そして、現在のセル状態がtanh層を通過する。
0.68
6
6
0.43
英語(論文から抽出)
日本語訳
スコア
scale the cell state value between -1 and 1.
細胞状態値を-1から1にスケールする。
0.80
Thus, the output h(t) is
したがって、出力h(t) は、
0.76
h(t) = o(t) (cid:12) tanh
h(t) = o(t) (cid:12) tanh
0.49
, with o(t) = σ
; o(t) = σ で
0.73
Wo · [h(t−1), x(t)] + bo
Wo · [h(t−1), x(t)] + bo
0.43
c(t)(cid:17) (cid:16)
c(t)(cid:17) (cid:16)
0.43
(cid:16) (cid:17)
(出典:16) (cid:17)
0.53
, (5) where Wo and bo are weighting matrix and biased vector, respectively.
, (5) wo と bo はそれぞれ重み付き行列とバイアス付きベクトルである。
0.53
Based upon h(t), the network decides which information from the current hidden state should be carried out to the next hidden state where the next hidden state is used for prediction.
The input gate decides what relevant information can be added from the current cell state, and the output gates finalize the input to the next hidden state.
Training an LSTM is the process of minimizing a relevant reconstruction error function, also called loss function, with respect to weights and bias vectors of Eqns.
Such a minimization problem is implemented in four steps: first, forward propagation of input data through the ANN to get the output; second, calculate the loss, between forecasted output and the true output; third, calculate the derivatives of the loss function with respect to the LSTM’s weights and bias vectors using backpropagation through time (BTT) Werbos (1990); and fourth, adjusting the weights and bias vectors by gradient descent method Gruslys et al (2016).
このような最小化問題は、ANNを介して入力データを前方に伝播して出力を得る、第2に予測出力と真の出力の間の損失を計算する、第3にLSTMの重みとバイアスベクトルに関する損失関数の微分を時間(BTT)Werbos(1990)を用いて計算する、第4に勾配降下法Gruslys et al(2016)により重みとバイアスベクトルを調整する、という4つのステップで実現される。
0.82
BTT unrolls backward all the dependencies of the output onto the weights of the ANN Manneschi & Vasilaki (2020), which is represented from left side to right side in Fig 2(a).
, (6) where F denotes the Frobenius norm and ˆx(T +t−1) is the output of the LSTM for the input X (t−1).
, (6) ここで F はフロベニウスノルムを表し、x(T +t−1) は入力 X (t−1) に対する LSTM の出力である。
0.56
We use BTT to compute the derivatives of Eqn.
我々は、BTTを用いてEqnの微分を計算する。
0.64
(6) with respect to the weights and bias vectors.
(6) 重みと偏りベクトルについて。
0.50
We update the weights using the gradient descent-based method, called Adaptive Moment Estimation (ADAM) Kingma & Ba (2015).
我々は,Adaptive Moment Estimation (ADAM) Kingma & Ba (2015) と呼ばれる勾配降下法を用いて重みを更新する。
0.83
ADAM is an iterative optimization algorithm used in recent machine learning algorithms to minimize loss functions where it employs the averages of both the first-moment gradients and the secondmoment of the gradients for computations.
It generally converges faster than standard gradient descent methods and saves memory by not accumulating the intermediate weights.
一般に標準勾配降下法よりも早く収束し、中間重みを蓄積しないことでメモリを節約する。
0.71
To assure better convergence of the loss function, we integrate epochs into the training process in a unique way that we explain here for the t-th iteration.
However, if the loss function is non-convex or semiconvergence choosing the best number of epochs is challenging.
しかし、損失関数が非凸あるいは半収束である場合、最良のエポック数を選択することは困難である。
0.60
Fig 3 illustrates the non-convex behavior of an LSTM’s loss function that is trained with the closing prices of the Apple stock.
図3は、Apple株の終値でトレーニングされたLSTMの損失関数の非凸動作を示しています。
0.74
Here, we input a sequence of 1227 days of prices into the LSTM and generate the price for the 1228-th day where the loss is computed as the relative mean square error between the predicted price and the observed prices for the 1228-th day.
We assume that those two LSTMs corresponding to the (t − 1)-th iteration are given for the t-th iteration.
t − 1)-次反復に対応する2つのLSTMがt-次反復に対して与えられると仮定する。
0.75
For the t-th iteration, we train LST M1 with the input X (t) and the label xT +t−1 for fix number of epochs, say
t 番目のイテレーションでは、入力 X (t) とラベル xT +t−1 で LST M1 を訓練し、エポック数の固定数、例えば、
0.73
7
7
0.42
英語(論文から抽出)
日本語訳
スコア
L. Here, we record LST M1’s optimum weights and biased vectors corresponding to each of the epoch.
ここでは、LST M1の最適重みと各エポックに対応する偏りベクトルを記録する。
0.69
We reformulate LST M2 with the weights and biased vectors corresponding to the least loss among L epochs.
LST M2をLエポック間の最小損失に対応する重みと偏りベクトルで再構成する。
0.72
Finally, we redefine LST M1 as LST M2 and proceed to the (t + 2)-th iteration.
最後に LST M1 を LST M2 として再定義し、 (t + 2)-次反復に進む。
0.83
Algorithm 1 summarizes the training and prediction procedure of our sequentially trained many-to-one LSTM scheme.
アルゴリズム1は、逐次訓練された多対一LSTMスキームのトレーニングと予測手順を要約する。
0.67
Figure 3: Non-convex behavior of the loss function.
図3: 損失関数の非凸挙動。
0.72
We apply our training scheme to train an LSTM with closing prices of the Apple stock.
我々は、Apple株の終値でLSTMをトレーニングするためにトレーニングスキームを適用します。
0.75
We input a sequence of 1227 days of prices into the LSTM and generate the price for the 1228-th day.
LSTMに1227日間の価格を入力し、1228日の価格を生成する。
0.53
The loss is computed as the relative mean square error between the predicted price and the observed prices for the 1228-th day.
損失は、予測価格と観測された1228日の価格との相対平均二乗誤差として計算される。
0.82
We proceed with this single-day training 60 times, also called epochs, where the loss for the l-th epoch is denoted as L(t)
我々は、この1日のトレーニングを60回行う。これはエポックとも呼ばれ、l番目のエポックの損失をl(t)と表記する。 訳抜け防止モード: 私たちはこの1日制のトレーニングを60回進めます。 l - th epoch の損失は L(t) と表される
0.68
. l 2.3.
. うーん 2.3.
0.41
State-of-the-art methods Here, we present three state-of-the-art time series prediction methods, namely, extended Kalman filter (EKF), autoregression, and autoregressive integrated moving average (ARIMA), that we use to validate the performance of our LSTM scheme.
EKF is a nonlinear version of the standard Kalman filter where the formulation of EKF is based on the linearization of both the state and the observation equations.
In an EKF, the state Jacobian and the measurement Jacobian replace the state transition matrix and the measurement matrix, respectively, of a linear KF (Valade et al (2017)).
EKF では、状態ヤコビアンと測定ヤコビアンはそれぞれ、線形 KF (Valade et al (2017)) の状態遷移行列と測定行列を置き換える。
0.55
This process essentially linearizes the non-linear function around the current estimate.
この過程は本質的に現在の推定値の周りの非線形関数を線型化する。
0.61
Linearization enables the propagation of both the state and state covariance in an approximately linear format.
線形化により、状態と状態の共分散をほぼ線形形式で伝播することができる。
0.72
Here, the extended Kalman filter is presented in three steps, namely, dynamic process, model forecast step, and data assimilation step.
Compute ˆx(T +t−1) = LST M1 Minimize L(t) F using BTT with respect to the weights Wi, Wi, Wc, and Wo, and bias vectors bf , bi, bc, and bo of the composite representation of the functions in Eqns.
x(t + t−1) = lst m1 を計算し、重み wi, wi, wc, wo に対して btt を用いて l(t) f を最小化し、eqns における関数の複合表現のバイアスベクトル bf , bi, bc, bo を計算する。
0.74
(1), (2), (3), (4), and (5).
(1), (2), (3), (4), and (5).
0.35
Record L(t) Update the weights and bias vectors of LST M1 using the gradient descent-based method ADAM.
記録L(t)勾配降下法ADAMを用いてLST M1の重みとバイアスベクトルを更新する。 訳抜け防止モード: Record L(t ) Update the weights and bias vectors of LST M1 勾配降下 - Based method ADAM を使用する。
The state Jacobian and the measurement Jacobian replace linear KF’s state transition matrix and the measurement matrix, respectively Valade et al (2017).
状態ジャコビアンと測定ジャコビアンはそれぞれ線形 kf の状態遷移行列と測定行列を置き換え、それぞれ valade et al (2017) である。
0.66
Let, the initial estimates of the state and the covariance are x(0|0) and P (0|0), respectively.
状態の初期推定値と共分散値はそれぞれ x(0|0) と p(0|0) である。
0.78
The state and the covariance matrix are propagated to the next step using
状態と共分散行列は次のステップに伝播される
0.68
ˆx(t+1) ≈ f
f (複数形 fs または fs)
0.41
(cid:16) P (t)(cid:104)
(cid:16)P(t)(cid:104 )
0.42
ˆx(t)(cid:17) (cid:16) ˆx(t)(cid:17)(cid:105 )T
x(t)(cid:17) (cid:16) (cid:17)(cid:105)t
0.45
Jf + Q(t),
JF q(t) である。
0.44
and respectively. (cid:16) ˆx(t)(cid:17)
そして それぞれ。 (出典:16)x(t)(出典:17)
0.68
P (t+1) = Jf
P (t+1) = Jf
0.46
Data Assimilation Step.
データ同化ステップ。
0.63
The measurement at the t + 1 step is given by
t + 1 ステップにおける測定は
0.65
(cid:16) ˆx(t+1)(cid:17)
(出典:16) x(t+1)(cid:17) である。
0.60
y(t+1) ≈ h
y(t+1) は h である。
0.60
9 (9) (10)
9 (9) (10)
0.42
(11)
(11)
0.43
英語(論文から抽出)
日本語訳
スコア
Use the difference between the actual measurement and the predicted measurement to correct the state at the t + 1 step.
実際の測定値と予測された測定値の差を使って、t + 1 ステップの状態を補正する。
0.81
To correct the state, the filter must compute the Kalman gain.
状態を修正するには、フィルタはカルマンゲインを計算する必要がある。
0.67
First, the filter computes the measurement prediction covariance (innovation) as
まず、フィルタは測定予測共分散(イノベーション)を計算する。
0.71
Then, the filter computes the Kalman gain as
そして、フィルタはカルマンゲインを計算する。
0.71
x(t+1)(cid:17) (cid:16) = P (t+1)(cid:104)
x(t+1)(cid:17) (cid:16) = P (t+1)(cid:104)
0.39
S(t+1) = Jh
S(t+1) = Jh
0.46
K (t+1)
k (複数形 ks)
0.48
g The filter corrects the predicted estimate by using observation.
g フィルタは、観測により予測された推定値を補正する。
0.54
The estimate, after the correction using the observation y(t+1), is
The AR model predicts the value for the current time step, x(t), based on a linear relationship between p-recent observations, x(t−1), x(t−2), . . . , x(t−p), where this p is known as the order of the model Geurts et al (1977).
ARモデルは、このpがモデルGeurts et al (1977) の順序として知られている場合、p-Recent Observation, x(t−1), x(t−2), . . , x(t−p) の間の線形関係に基づいて、現在の時間ステップ x(t) の値を予測する。
0.87
Let, α1, . . . , αp ∈ R are the coefficients, order p AR model is given by
MA model captures serial autocorrelation in a time series x(1), . . . , x(t), . . . , x(T ) by expressing the conditional mean of x(t) as a function of past innovations, at(), a(t−1), ..., a(t−q).
In this case, a combined ARMA model can sometimes be a parsimonious choice.
この場合、組み合わせたARMAモデルは、時には同義的な選択である。
0.70
An ARMA model expresses the conditional mean of x(t) as a function of both recent observations, x(t−1), x(t−2), . . . , x(t−p), and recent innovations, a(t), a(t−1), . . . , a(t−q).
The ARIMA process generates nonstationary series that are integrated of order D where that nonstationary process can be made stationary by taking D differences.
ARIMA 過程は位数 D と統合された非定常級数を生成するが、非定常級数は D の差を取ることで定常にすることができる。 訳抜け防止モード: ARIMAプロセスは非定常級数を生成する D の差分を取ることで非定常過程を定常にすることができる。
0.61
A series that can be modeled as a stationary ARMA(p, q) process after being differenced D times is denoted by ARIMA(p, D, q), which is given by
D 時間が異なる後、静止 ARMA(p, q) プロセスとしてモデル化できる級数は ARIMA(p, D, q) で表される。
(20) where ∆Dx(t) denotes a D-th differential time series, a(t)’s are uncorrelated innovation processes with a zero mean, and µ is the unconditional mean of x(t) for all t (Newbold, 1983).
(20) Dx(t) が D 番目の微分時間列を表すとき、a(t) はゼロ平均を持つ非相関な革新過程であり、μ はすべての t に対する x(t) の非条件平均である(Newbold, 1983)。
0.61
With the lag operator Lix(t) = x(t−i), the ARIMA model can be written as
x(t) = The performance analysis of LSTM is conducted using nine financial time series obtained from three markets, namely, stocks, cryptocurrencies, and commodities.
2. Setting the LSTM to run for a specific number of epochs and then using that trained network to make predictions often do not perform the best training and then do not perform accurate predictions since the loss function undergoes semi-convergence as shown in Fig 3. To avoid this issue; first, we train the LSTM for 100 epochs; second, we compute the best number of epochs associated with the least loss; and finally, train again a new LSTM with that many epochs.
Moreover, the parameter choices for the training length and prediction length are shown in Table 3.
さらに、トレーニング長と予測長のパラメータ選択を表3に示す。
0.60
Now, we incorporate the same one-day recursive prediction procedure in Fig 1 into the other three stateof-the-art methods, namely, EKF, AR, and ARIMA, to predict the above financial time series.
After a trial and error process, we found that the best p’s of AR are 300, 400, and 400, for Apple, Microsoft, and Google, respectively; and the best (p, D, q)’s of ARIMA are (10, 0, 2), (10, 2, 1), and (0, 1 , 1), for Apple, Microsoft, and Google, respectively.
Then, the best p’s of AR were found to be 100, 100, and 300, for Bitcoin, Ethereum,
そして、最高のp’s of arは、bitcoin、ethereumで100、100、300であることが判明した。
0.77
11
11
0.43
英語(論文から抽出)
日本語訳
スコア
time series
time + series
0.37
Training length Prediction length
トレーニング長さ予測時間
0.81
Apple Microsoft Google
Apple Microsoft Google
0.42
Bitcoin Ethereum Cardano
Bitcoin Ethereum Cardano
0.42
Oil Natural gas Gold
石油天然ガスゴールド
0.80
(T ) 1228 1228 1228
(T) 1228 1228 1228
0.42
1064 1064 1064
1064 1064 1064
0.42
8248 5802 816
8248 5802 816
0.43
(N ) 30 30 30
(N) 30 30 30
0.41
30 30 30 200 150 30
30 30 30 200 150 30
0.43
Table 3: Parameter choices for the training length, prediction length, and number of epochs used in real-time many-to-one LSTMs.
表3: リアルタイム多対一LSTMで使用されるトレーニング長、予測長、エポック数のパラメータ選択。
0.68
and Cardano, respectively; and the best (p, D, q)’s of ARIMA were found to be (6, 0, 2), (6, 1, 1), and (8, 2 , 1), for Bitcoin, Ethereum, and Cardano, respectively.
Finally, the best p’s of AR were 200, 200, and 100, for Oil, Natural gas, and Gold, respectively; and the best (p, D, q)’s of ARIMA were (4, 1, 1), (10, 1, 2), and (8, 2 , 0), for Oil, Natural gas, and Gold, respectively.
Since some of the predictions closely mimic the observed time series to overlap, we compute the absolute difference between the observations and the predictions, see Figs.
We observe that all four methods are capable of capturing the pattern of the time series with the order of the best to the worst prediction performance is LSTM, ARIMA, AR, and EKF.
Since some of the predictions closely mimic the observed time series to overlap, we compute the absolute difference between the observations and the predictions, see Figs.
We compute the absolute difference between the observations and the predictions, see Figs.
観測と予測の絶対的な差を計算する。
0.46
5(c), 5(f), and 5(i) since some of the predictions are similar to observations.
5(c), 5(f), 5(i)の予測は観測と類似している。
0.67
We observe that mostly LSTM, ARIMA, and AR are capable of capturing the pattern of the time series.
我々は、LSTM、ARIMA、ARが時系列のパターンを捉えることができることを観察する。
0.70
The order of the best to the worst prediction performance is LSTM, Ar, ARIMA, and EKF.
最悪の予測性能のベストの順は、LSTM、Ar、ARIMA、EKFである。
0.57
12
12
0.42
英語(論文から抽出)
日本語訳
スコア
Figure 4: Price prediction of three stocks, Apple (first row), Microsoft (second row), and Google (third row), using our real-time many to one LSTM (blue), EKF (yellow), AR (green), and ARIMA (purple).
The first, second, and third columns show the entire time series, observed and predicted time series for the last 30 days of the prediction period, and the absolute difference between observed and predicted time series, respectively.
Figure 5: Price prediction of three cryptocurrencies, Bitcoin (first row), Ethereum (second row), and Cardano (third row), using our real-time many to one LSTM (blue), EKF (yellow), AR (green), and ARIMA (purple).
The first, second, and third columns show the entire time series, observed and predicted time series for the last 30 days of the prediction period, and the absolute difference between observed and predicted time series, respectively.
Figure 6: Price prediction of three commodities, Oil (first row), Natural gas (second row), and Gold (third row), using our real-time many to one LSTM (blue), EKF (yellow), AR (green), and ARIMA (purple).
The first, second, and third columns show the entire time series, observed and predicted time series for the last 30 days of the prediction period, and the absolute difference between observed and predicted time series, respectively.
Table 4: This table shows the prediction error, quantified as the means of the relative difference between the predicted and the observed time series for the prediction period, of four methods LSTM, EKF, AR, and ARIMA.
The analysis is conducted on three stocks Apple, Microsoft, and Google; three cryptocurrencies, Bitcoin, Ethereum, and Cardano; and three commodities, oil, natural gas, and gold.
The prediction performance is computed as the mean of the relative absolute difference, i.e., E, between the prediction and the observed time series.
予測性能は、相対絶対差、すなわち、予測と観測された時系列との間の平均として計算される。
0.76
Since EKF, AR, and ARIMA are independent of epochs, we represent their E as a straight line.
EKF、AR、ARIMAはエポックとは独立であるため、Eを直線として表現する。
0.66
We observe that the performance of LSTM improves from worst to the best when the number of epochs is increased.
LSTMの性能は,エポック数の増加に伴って最悪のものから最高のものへと向上する。
0.68
4. Discussion The classical methods of solving temporal chaotic systems are mostly linear models which assume linear relationships between systems’ previous outputs for stationary time series.
Thus, they often do not capture non-linear relationships in the data and cannot cope with certain non-stationary signals.
したがって、それらはしばしばデータの非線形関係を捉えず、特定の非定常信号に対応できない。
0.72
Because financial time series are often nonstationary, nonlinear, and contain noise (Bontempi et al , 2013), traditional statistical models encounter some limitations in predicting them with high precision.
金融時系列はしばしば非定常で非線形であり、ノイズを含んでいる(bontempi et al , 2013)ため、従来の統計モデルでは高い精度で予測するのにいくつかの制限がある。
0.67
In this paper, we have presented a real-time forecasting technique for financial markets using sequentially trained many-to-one LSTM.
We applied this technique for some time series obtained from stock market, cryptocurrency market, and commodity market, then, compared the performance against three state-of-the-art methods, namely, EKF, AR, and ARIMA.
Here, we train a many-to-one LSTM with sequential data sampled using a moving window approach such that the succeeding window is shifted forward by one data instance from the preceding window.
Figure 7: Prediction performance of the real-time many-to-one LSTM with respect to the different numbers of epochs.
図7:エポック数の異なる実時間多対一LSTMの予測性能。
0.61
The first row shows the mean of the relative absolute difference, denoted as E, between the prediction and the observed time series for Apple, Bitcoin, and Gold.
Note that, EKF, AR, and ARIMA are independent of epochs; however, we represent a straight line for their E in the first row.
注意すべき点は、EKF, AR, ARIMA はエポックとは独立であるが、最初の行でそれらの E の直線を表すことである。
0.72
The second, the third, and the fourth rows show the prediction and the observed time series of the prices in the prediction periods of Apple, Bitcoin, and Gold for 10 epochs, 20 epochs, and 50 epochs, respectively.
The performance analysis of this study covers the LSTM applied to nine time series obtained from three financial markets, stocks (Apple, Microsoft, Google), cryptocurrencies (Bitcoin, Ethereum, Cardano), and commodities (gold, crude oil, natural gas).
We observed that the LSTM performs exceptionally better than the other three methods for all the nine datasets where the performance of EKF was significantly weak.
The average prediction errors of LSTM are 0.05, 0.22, and 0.14 for stocks, cryptocurrencies, and commodities, respectively.
LSTMの平均予測誤差は、それぞれ株、暗号通貨、商品について0.05、0.22、0.14である。
0.65
The reason for that is while the prediction on less volatile time series like in the stock market is easy, the prediction on high volatile time series like in the cryptocurrency market is challenging.
In future work, we are planning to extend this sequentially trained many-to-one LSTM to employ as a realtime fault detection technique in industrial production processes.
This real-time fault detection scheme will be capable of producing an early alarm to alert a shift in the production process so the quality controlling team can take necessary actions.
Moreover, trajectories of collectively moving agents can be represented on a low-dimensional manifold that underlies on a high-dimensional data cloud Gajamannage et al (2019); Gajamannage & Paffenroth (2021); Gajamannage et al (2015).
さらに、集合移動エージェントの軌道は、高次元のデータクラウドgajamannage et al (2019)、gajamannage & paffenroth (2021)、gajamannage et al (2015) 上の低次元多様体上で表現することができる。
0.73
However, some segments of these trajectories are not tracked by multi-object tracking methods due to natural phenomena such as occlusions.
Thus, we are planning to utilize our LSTM architecture to make predictions for the fragmented segments of the trajectories.
そこで我々はLSTMアーキテクチャを用いて,軌道の断片化されたセグメントの予測を行う。
0.80
We empirically validated that our real-time LSTM outperforms the performance of EKF, AR, and ARIMA.
実時間LSTMがEKF, AR, ARIMAの性能より優れていることを実証的に検証した。
0.60
In the future, we are planning to compare the performance of our real-time LSTM with that of the other famous ANN-based methods such as Facebook developed Prophet (Taylor & Letham, 2018), Amazon developed DeepAR (Salinas et al , 2020), Google developed Temporal Fusion Transformer (Lim et al , 2021), and Element AI developed N-BEATS (Oreshkin et al , 2019).
将来的には、当社のリアルタイムLSTMのパフォーマンスと、Facebookが開発したProphet(Taylor & Letham, 2018)、AmazonがDeepAR(Salinas et al , 2020)、GoogleがTemporal Fusion Transformer(Lim et al , 2021)、Element AIがN-BEATS(Oreshkin et al , 2019)といった、他の有名なANNベースの方法との比較を計画している。
0.86
Prophet was designed for automatic forecasting of univariate time series data.
prophetは、不定時系列データの自動予測のために設計された。
0.59
DeepAR is a probabilistic forecasting model based on recurrent neural networks.
DeepARは、リカレントニューラルネットワークに基づく確率予測モデルである。
0.84
Temporal Fusion Transformer is a novel attention-based architecture that combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics.
We presented both nonlinear and real-time prediction technique for financial time series that is made by a many-to-one LSTM which is sequentially trained with windows of data.
We empirically justified that our LSTM possesses superior performance even for highly volatile time series such as those in cryptocurrencies and commodities.
我々はLSTMが暗号通貨や商品などの高揮発性時系列でも優れた性能を有することを実証的に正当化した。
0.69
Acknowledgments The authors would like to thank the Google Cloud Platform for granting Research Credit to access its
Backpropagation Through Time: What It Does and How to Do It.
時間のバックプロパゲーション:何をするか、どのように行うか。
0.60
Proceedings of the IEEE , 78 , 1550–1560.
議事録 IEEE 78, 1550-1560。
0.61
doi:10.1109/5.58337.
doi:10.1109/5.58337。
0.15
Zhao, Y., Ge, L., Zhou, Y., Sun, Z., Zheng, E., Wang, X., Huang, Y., & Cheng, H. (2018).
Zhao, Y., Ge, L., Zhou, Y., Sun, Z., Zheng, E., Wang, X., Huang, Y., & Cheng, H. (2018)。
0.43
A new Seasonal Difference Space-Time Autoregressive Integrated Moving Average (SD-STARIMA) model and spatiotemporal trend prediction analysis for Hemorrhagic Fever with Renal Syndrome (HFRS).