Mini-Batch Learning Strategies for modeling long term temporal
dependencies: A study in environmental applications
- URL: http://arxiv.org/abs/2210.08347v1
- Date: Sat, 15 Oct 2022 17:44:21 GMT
- Title: Mini-Batch Learning Strategies for modeling long term temporal
dependencies: A study in environmental applications
- Authors: Shaoming Xu, Ankush Khandelwal, Xiang Li, Xiaowei Jia, Licheng Liu,
Jared Willard, Rahul Ghosh, Kelly Cutler, Michael Steinbach, Christopher
Duffy, John Nieber, Vipin Kumar
- Abstract summary: In environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies.
Due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered.
We propose two strategies to enforce both intra- and inter-batch temporal dependency.
- Score: 20.979235183394994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many environmental applications, recurrent neural networks (RNNs) are
often used to model physical variables with long temporal dependencies.
However, due to mini-batch training, temporal relationships between training
segments within the batch (intra-batch) as well as between batches
(inter-batch) are not considered, which can lead to limited performance.
Stateful RNNs aim to address this issue by passing hidden states between
batches. Since Stateful RNNs ignore intra-batch temporal dependency, there
exists a trade-off between training stability and capturing temporal
dependency. In this paper, we provide a quantitative comparison of different
Stateful RNN modeling strategies, and propose two strategies to enforce both
intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by
defining a batch as a temporally ordered set of training segments, which
enables intra-batch sharing of temporal information. While this approach
significantly improves the performance, it leads to much larger training times
due to highly sequential training. To address this issue, we further propose a
new strategy which augments a training segment with an initial value of the
target variable from the timestep right before the starting of the training
segment. In other words, we provide an initial value of the target variable as
additional input so that the network can focus on learning changes relative to
that initial value. By using this strategy, samples can be passed in any order
(mini-batch training) which significantly reduces the training time while
maintaining the performance. In demonstrating our approach in hydrological
modeling, we observe that the most significant gains in predictive accuracy
occur when these methods are applied to state variables whose values change
more slowly, such as soil water and snowpack, rather than continuously moving
flux variables such as streamflow.
Related papers
- Simulation-Free Training of Neural ODEs on Paired Data [20.36333430055869]
We employ the flow matching framework for simulation-free training of NODEs.
We show that applying flow matching directly between paired data can often lead to an ill-defined flow.
We propose a simple extension that applies flow matching in the embedding space of data pairs.
arXiv Detail & Related papers (2024-10-30T11:18:27Z) - Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks [22.47336262812308]
Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and long-term temporal dependencies.
This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes.
arXiv Detail & Related papers (2024-02-06T01:34:56Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Message Propagation Through Time: An Algorithm for Sequence Dependency
Retention in Time Series Modeling [14.49997340857179]
This paper proposes the Message Propagation Through Time (MPTT) algorithm for time series modeling.
MPTT incorporates long temporal dependencies while preserving faster training times relative to the stateful solutions.
Experimental results demonstrate that MPTT outperforms seven strategies on four climate datasets.
arXiv Detail & Related papers (2023-09-28T22:38:18Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - EGRU: Event-based GRU for activity-sparse inference and learning [0.8260432715157026]
We propose a model that reformulates Gated Recurrent Units (GRU) as an event-based activity-sparse model.
We show that the Event-based GRU (EGRU) demonstrates competitive performance compared to state-of-the-art recurrent network models in real-world tasks.
arXiv Detail & Related papers (2022-06-13T14:07:56Z) - Existence and Estimation of Critical Batch Size for Training Generative
Adversarial Networks with Two Time-Scale Update Rule [0.2741266294612775]
Previous results have shown that a two time-scale update rule (TTUR) using different learning rates is useful for training generative adversarial networks (GANs) in theory and in practice.
This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates.
arXiv Detail & Related papers (2022-01-28T08:52:01Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - Connecting the Dots: Multivariate Time Series Forecasting with Graph
Neural Networks [91.65637773358347]
We propose a general graph neural network framework designed specifically for multivariate time series data.
Our approach automatically extracts the uni-directed relations among variables through a graph learning module.
Our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets.
arXiv Detail & Related papers (2020-05-24T04:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.