Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes
- URL: http://arxiv.org/abs/2211.15144v2
- Date: Mon, 17 Apr 2023 18:45:23 GMT
- Title: Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes
- Authors: Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey
Levine
- Abstract summary: offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
- Score: 100.69714600180895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The potential of offline reinforcement learning (RL) is that high-capacity
models trained on large, heterogeneous datasets can lead to agents that
generalize broadly, analogously to similar advances in vision and NLP. However,
recent works argue that offline RL methods encounter unique challenges to
scaling up model capacity. Drawing on the learnings from these works, we
re-examine previous design choices and find that with appropriate choices:
ResNets, cross-entropy based distributional backups, and feature normalization,
offline Q-learning algorithms exhibit strong performance that scales with model
capacity. Using multi-task Atari as a testbed for scaling and generalization,
we train a single policy on 40 games with near-human performance using up-to 80
million parameter networks, finding that model performance scales favorably
with capacity. In contrast to prior work, we extrapolate beyond dataset
performance even when trained entirely on a large (400M transitions) but highly
suboptimal dataset (51% human-level performance). Compared to
return-conditioned supervised approaches, offline Q-learning scales similarly
with model capacity and has better performance, especially when the dataset is
suboptimal. Finally, we show that offline Q-learning with a diverse dataset is
sufficient to learn powerful representations that facilitate rapid transfer to
novel games and fast online learning on new variations of a training game,
improving over existing state-of-the-art representation learning approaches.
Related papers
- Tackling Long-Horizon Tasks with Model-based Offline Reinforcement Learning [6.345851712811528]
We introduce a novel model-based offline RL method, Lower Expectile Q-learning (LEQ), which enhances long-horizon task performance.
Our empirical results show that LEQ significantly outperforms previous model-based offline RL methods on long-horizon tasks.
LEQ achieves performance comparable to the state-of-the-art model-based and model-free offline RL methods on the NeoRL benchmark and the D4RL MuJoCo Gym tasks.
arXiv Detail & Related papers (2024-06-30T13:44:59Z) - Diffusion-based Neural Network Weights Generation [85.6725307453325]
We propose an efficient and adaptive transfer learning scheme through dataset-conditioned pretrained weights sampling.
Specifically, we use a latent diffusion model with a variational autoencoder that can reconstruct the neural network weights.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Bad Students Make Great Teachers: Active Learning Accelerates
Large-Scale Visual Understanding [9.655434542591815]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural
Networks [12.525959293825318]
We introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for deep neural networks (DNNs)
LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model, and the relearning phase, which emphasizes learning on generalizable features.
We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings.
arXiv Detail & Related papers (2023-03-18T16:45:54Z) - On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement
Learning [45.73223325256312]
We investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster.
We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models.
arXiv Detail & Related papers (2022-10-19T17:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.