Related papers: Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications

Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications

URL: http://arxiv.org/abs/2210.08347v1
Date: Sat, 15 Oct 2022 17:44:21 GMT
Title: Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications
Authors: Shaoming Xu, Ankush Khandelwal, Xiang Li, Xiaowei Jia, Licheng Liu, Jared Willard, Rahul Ghosh, Kelly Cutler, Michael Steinbach, Christopher Duffy, John Nieber, Vipin Kumar
Abstract summary: In environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. Due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered. We propose two strategies to enforce both intra- and inter-batch temporal dependency.
Score: 20.979235183394994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.

Related papers

Fast Training of Recurrent Neural Networks with Stationary State Feedbacks [48.22082789438538]
Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers. We propose a novel method that replaces BPTT with a fixed gradient feedback mechanism.
arXiv Detail & Related papers (2025-03-29T14:45:52Z)
Simulation-Free Training of Neural ODEs on Paired Data [20.36333430055869]
We employ the flow matching framework for simulation-free training of NODEs. We show that applying flow matching directly between paired data can often lead to an ill-defined flow. We propose a simple extension that applies flow matching in the embedding space of data pairs.
arXiv Detail & Related papers (2024-10-30T11:18:27Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks [22.47336262812308]
Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and long-term temporal dependencies. This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes.
arXiv Detail & Related papers (2024-02-06T01:34:56Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Message Propagation Through Time: An Algorithm for Sequence Dependency Retention in Time Series Modeling [14.49997340857179]
This paper proposes the Message Propagation Through Time (MPTT) algorithm for time series modeling. MPTT incorporates long temporal dependencies while preserving faster training times relative to the stateful solutions. Experimental results demonstrate that MPTT outperforms seven strategies on four climate datasets.
arXiv Detail & Related papers (2023-09-28T22:38:18Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Effective and Efficient Training for Sequential Recommendation using Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective. We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z)
EGRU: Event-based GRU for activity-sparse inference and learning [0.8260432715157026]
We propose a model that reformulates Gated Recurrent Units (GRU) as an event-based activity-sparse model. We show that the Event-based GRU (EGRU) demonstrates competitive performance compared to state-of-the-art recurrent network models in real-world tasks.
arXiv Detail & Related papers (2022-06-13T14:07:56Z)
Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule [0.2741266294612775]
Previous results have shown that a two time-scale update rule (TTUR) using different learning rates is useful for training generative adversarial networks (GANs) in theory and in practice. This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates.
arXiv Detail & Related papers (2022-01-28T08:52:01Z)
What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training. Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z)
Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks [91.65637773358347]
We propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module. Our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets.
arXiv Detail & Related papers (2020-05-24T04:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.