Probing the limit of hydrologic predictability with the Transformer
network
- URL: http://arxiv.org/abs/2306.12384v1
- Date: Wed, 21 Jun 2023 17:06:54 GMT
- Title: Probing the limit of hydrologic predictability with the Transformer
network
- Authors: Jiangtao Liu, Yuchen Bian and Chaopeng Shen
- Abstract summary: We show that a vanilla Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS dataset.
A recurrence-free variant of Transformer can obtain mixed comparisons with LSTM, producing the same Kling-Gupta efficiency coefficient (KGE) along with other metrics.
While the Transformer results are not higher than current state-of-the-art, we still learned some valuable lessons.
- Score: 7.326504492614808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For a number of years since its introduction to hydrology, recurrent neural
networks like long short-term memory (LSTM) have proven remarkably difficult to
surpass in terms of daily hydrograph metrics on known, comparable benchmarks.
Outside of hydrology, Transformers have now become the model of choice for
sequential prediction tasks, making it a curious architecture to investigate.
Here, we first show that a vanilla Transformer architecture is not competitive
against LSTM on the widely benchmarked CAMELS dataset, and lagged especially
for the high-flow metrics due to short-term processes. However, a
recurrence-free variant of Transformer can obtain mixed comparisons with LSTM,
producing the same Kling-Gupta efficiency coefficient (KGE), along with other
metrics. The lack of advantages for the Transformer is linked to the Markovian
nature of the hydrologic prediction problem. Similar to LSTM, the Transformer
can also merge multiple forcing dataset to improve model performance. While the
Transformer results are not higher than current state-of-the-art, we still
learned some valuable lessons: (1) the vanilla Transformer architecture is not
suitable for hydrologic modeling; (2) the proposed recurrence-free modification
can improve Transformer performance so future work can continue to test more of
such modifications; and (3) the prediction limits on the dataset should be
close to the current state-of-the-art model. As a non-recurrent model, the
Transformer may bear scale advantages for learning from bigger datasets and
storing knowledge. This work serves as a reference point for future
modifications of the model.
Related papers
- Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Transformers versus LSTMs for electronic trading [0.0]
This study investigates whether Transformer-based model can be applied in financial time series prediction and beat LSTM.
A new LSTM-based model called DLSTM is built and new architecture for the Transformer-based model is designed to adapt for financial prediction.
The experiment result reflects that the Transformer-based model only has the limited advantage in absolute price sequence prediction.
arXiv Detail & Related papers (2023-09-20T15:25:43Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - Temporal Fusion Transformers for Streamflow Prediction: Value of
Combining Attention with Recurrence [0.0]
This work tests the hypothesis that combining recurrence with attention can improve streamflow prediction.
We set up the Temporal Fusion Transformer (TFT) architecture, a model that combines both of these aspects and has never been applied in hydrology before.
Our results demonstrate that TFT indeed exceeds the performance benchmark set by the LSTM and Transformers for streamflow prediction.
arXiv Detail & Related papers (2023-05-21T03:58:16Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - Learning Bounded Context-Free-Grammar via LSTM and the
Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks.
In practice, it is often observed that Transformer models have better representation power than LSTM.
We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.