Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting
the Tree
- URL: http://arxiv.org/abs/2110.02497v1
- Date: Wed, 6 Oct 2021 04:25:14 GMT
- Title: Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting
the Tree
- Authors: Saurav Kadavath, Samuel Paradis, Brian Yao
- Abstract summary: Pretraining is a common technique in deep learning for increasing performance and reducing training time.
We evaluate the effectiveness of pretraining for RL tasks, with and without distracting backgrounds, using both large, publicly available datasets and case-by-case generated datasets.
Results suggest filters learned during training on less relevant datasets render pretraining ineffective, while filters learned during training on the in-distribution datasets reliably reduce RL training time and improve performance after 80k RL training steps.
- Score: 2.0142516017086165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretraining is a common technique in deep learning for increasing performance
and reducing training time, with promising experimental results in deep
reinforcement learning (RL). However, pretraining requires a relevant dataset
for training. In this work, we evaluate the effectiveness of pretraining for RL
tasks, with and without distracting backgrounds, using both large, publicly
available datasets with minimal relevance, as well as case-by-case generated
datasets labeled via self-supervision. Results suggest filters learned during
training on less relevant datasets render pretraining ineffective, while
filters learned during training on the in-distribution datasets reliably reduce
RL training time and improve performance after 80k RL training steps. We
further investigate, given a limited number of environment steps, how to
optimally divide the available steps into pretraining and RL training to
maximize RL performance. Our code is available on GitHub
Related papers
- Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Pre-training with Synthetic Data Helps Offline Reinforcement Learning [4.531082205797088]
We show that language is not essential for improved performance.
We then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm.
Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL.
arXiv Detail & Related papers (2023-10-01T19:32:14Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Zero-Shot Reinforcement Learning from Low Quality Data [5.079602839359521]
Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline, reward-free pre-training phase.
Here, we explore how the performance of zero-shot RL methods degrades when trained on small homogeneous datasets.
We propose fixes inspired by conservatism, a well-established feature of performant single-task offline RL algorithms.
arXiv Detail & Related papers (2023-09-26T18:20:20Z) - When Less is More: Investigating Data Pruning for Pretraining LLMs at
Scale [12.94829977468838]
Large volumes of text data have contributed significantly to the development of large language models.
To date, efforts to prune datasets down to a higher quality subset have relied on hand-crafteds encoded as rule-based filters.
We take a wider view and explore scalable estimates of data quality that can be used to measure the quality of pretraining data.
arXiv Detail & Related papers (2023-09-08T19:34:05Z) - D4: Improving LLM Pretraining via Document De-Duplication and
Diversification [38.84592304799403]
We show that careful data selection via pre-trained model embeddings can speed up training.
We also show that repeating data intelligently consistently outperforms baseline training.
arXiv Detail & Related papers (2023-08-23T17:58:14Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.