Flipped Classroom: Effective Teaching for Time Series Forecasting
- URL: http://arxiv.org/abs/2210.08959v1
- Date: Mon, 17 Oct 2022 11:53:25 GMT
- Title: Flipped Classroom: Effective Teaching for Time Series Forecasting
- Authors: Philipp Teutsch and Patrick M\"ader
- Abstract summary: Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data.
The two most common training strategies within this context are teacher forcing (TF) and free running (FR)
We propose several new curricula, and systematically evaluate their performance in two experimental sets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence models based on LSTM and GRU are a most popular choice
for forecasting time series data reaching state-of-the-art performance.
Training such models can be delicate though. The two most common training
strategies within this context are teacher forcing (TF) and free running (FR).
TF can be used to help the model to converge faster but may provoke an exposure
bias issue due to a discrepancy between training and inference phase. FR helps
to avoid this but does not necessarily lead to better results, since it tends
to make the training slow and unstable instead. Scheduled sampling was the
first approach tackling these issues by picking the best from both worlds and
combining it into a curriculum learning (CL) strategy. Although scheduled
sampling seems to be a convincing alternative to FR and TF, we found that, even
if parametrized carefully, scheduled sampling may lead to premature termination
of the training when applied for time series forecasting. To mitigate the
problems of the above approaches we formalize CL strategies along the training
as well as the training iteration scale. We propose several new curricula, and
systematically evaluate their performance in two experimental sets. For our
experiments, we utilize six datasets generated from prominent chaotic systems.
We found that the newly proposed increasing training scale curricula with a
probabilistic iteration scale curriculum consistently outperforms previous
training strategies yielding an NRMSE improvement of up to 81% over FR or TF
training. For some datasets we additionally observe a reduced number of
training iterations. We observed that all models trained with the new curricula
yield higher prediction stability allowing for longer prediction horizons.
Related papers
- Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Examining the Effect of Pre-training on Time Series Classification [21.38211396933795]
This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process.
We conducted a thorough examination of 150 classification datasets.
We find that pre-training can only help improve the optimization process for models that fit the data poorly.
Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
arXiv Detail & Related papers (2023-09-11T06:26:57Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Bridging the Gap Between Training and Inference for Spatio-Temporal
Forecasting [16.06369357595426]
We propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to bridge the gap between training and inference for S-temporal sequence forecasting.
Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.
arXiv Detail & Related papers (2020-05-19T10:14:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.