The effects of regularisation on RNN models for time series forecasting:
Covid-19 as an example
- URL: http://arxiv.org/abs/2105.05932v1
- Date: Sun, 9 May 2021 10:50:57 GMT
- Title: The effects of regularisation on RNN models for time series forecasting:
Covid-19 as an example
- Authors: Marcus Carpenter, Chunbo Luo, Xiao-Si Wang
- Abstract summary: This paper presents a model with grater flexibility than the other proposed neural networks.
To improve performance on small data, six regularisation methods were tested.
Applying Dropout to a GRU model trained on only 28 days of data reduced the RMSE by 23%.
- Score: 2.5397218862229254
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Many research papers that propose models to predict the course of the
COVID-19 pandemic either use handcrafted statistical models or large neural
networks. Even though large neural networks are more powerful than simpler
statistical models, they are especially hard to train on small datasets. This
paper not only presents a model with grater flexibility than the other proposed
neural networks, but also presents a model that is effective on smaller
datasets. To improve performance on small data, six regularisation methods were
tested. The results show that the GRU combined with 20% Dropout achieved the
lowest RMSE scores. The main finding was that models with less access to data
relied more on the regulariser. Applying Dropout to a GRU model trained on only
28 days of data reduced the RMSE by 23%.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z) - Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z) - Ensemble Machine Learning Model Trained on a New Synthesized Dataset
Generalizes Well for Stress Prediction Using Wearable Devices [3.006016887654771]
We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols.
We propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data.
arXiv Detail & Related papers (2022-09-30T00:20:57Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Improving Neural Networks for Time Series Forecasting using Data
Augmentation and AutoML [0.0]
This paper presents an easy to implement data augmentation method to significantly improve the performance of neural networks.
It shows that data augmentation, when paired Automated Machine Learning techniques such as Neural Architecture Search, can help to find the best neural architecture for a given time series.
arXiv Detail & Related papers (2021-03-02T19:20:49Z) - A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch.
In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?
Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z) - Forecasting Industrial Aging Processes with Machine Learning Methods [0.0]
We evaluate a wider range of data-driven models, comparing some traditional stateless models to more complex recurrent neural networks.
Our results show that recurrent models produce near perfect predictions when trained on larger datasets.
arXiv Detail & Related papers (2020-02-05T13:06:44Z) - Model Extraction Attacks against Recurrent Neural Networks [1.2891210250935146]
We study the threats of model extraction attacks against recurrent neural networks (RNNs)
We discuss whether a model with a higher accuracy can be extracted with a simple RNN from a long short-term memory (LSTM)
We then show that a model with a higher accuracy can be extracted efficiently, especially through configuring a loss function and a more complex architecture.
arXiv Detail & Related papers (2020-02-01T01:47:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.