Towards Deeper Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2106.01151v1
- Date: Wed, 2 Jun 2021 13:41:02 GMT
- Title: Towards Deeper Deep Reinforcement Learning
- Authors: Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger
- Abstract summary: In computer vision and natural language processing, state-of-the-art reinforcement learning algorithms often use only small intrinsics.
We show that dataset size is not the limiting factor, and instead argue that instability from the actor in SAC taking gradients through the critic is the culprit.
- Score: 42.960199987696306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In computer vision and natural language processing, innovations in model
architecture that lead to increases in model capacity have reliably translated
into gains in performance. In stark contrast with this trend, state-of-the-art
reinforcement learning (RL) algorithms often use only small MLPs, and gains in
performance typically originate from algorithmic innovations. It is natural to
hypothesize that small datasets in RL necessitate simple models to avoid
overfitting; however, this hypothesis is untested. In this paper we investigate
how RL agents are affected by exchanging the small MLPs with larger modern
networks with skip connections and normalization, focusing specifically on soft
actor-critic (SAC) algorithms. We verify, empirically, that na\"ively adopting
such architectures leads to instabilities and poor performance, likely
contributing to the popularity of simple models in practice. However, we show
that dataset size is not the limiting factor, and instead argue that intrinsic
instability from the actor in SAC taking gradients through the critic is the
culprit. We demonstrate that a simple smoothing method can mitigate this issue,
which enables stable training with large modern architectures. After smoothing,
larger models yield dramatic performance improvements for state-of-the-art
agents -- suggesting that more "easy" gains may be had by focusing on model
architectures in addition to algorithmic innovations.
Related papers
- Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening.
Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training.
We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z) - Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme [0.0]
Emergence in machine learning refers to the spontaneous appearance of capabilities that arise from the scale and structure of training data.
We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence.
We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization.
arXiv Detail & Related papers (2024-07-26T18:56:47Z) - Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z) - Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - RLFlow: Optimising Neural Network Subgraph Transformation with World
Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime.
We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.