Towards Deeper Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2106.01151v1
- Date: Wed, 2 Jun 2021 13:41:02 GMT
- Title: Towards Deeper Deep Reinforcement Learning
- Authors: Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger
- Abstract summary: In computer vision and natural language processing, state-of-the-art reinforcement learning algorithms often use only small intrinsics.
We show that dataset size is not the limiting factor, and instead argue that instability from the actor in SAC taking gradients through the critic is the culprit.
- Score: 42.960199987696306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In computer vision and natural language processing, innovations in model
architecture that lead to increases in model capacity have reliably translated
into gains in performance. In stark contrast with this trend, state-of-the-art
reinforcement learning (RL) algorithms often use only small MLPs, and gains in
performance typically originate from algorithmic innovations. It is natural to
hypothesize that small datasets in RL necessitate simple models to avoid
overfitting; however, this hypothesis is untested. In this paper we investigate
how RL agents are affected by exchanging the small MLPs with larger modern
networks with skip connections and normalization, focusing specifically on soft
actor-critic (SAC) algorithms. We verify, empirically, that na\"ively adopting
such architectures leads to instabilities and poor performance, likely
contributing to the popularity of simple models in practice. However, we show
that dataset size is not the limiting factor, and instead argue that intrinsic
instability from the actor in SAC taking gradients through the critic is the
culprit. We demonstrate that a simple smoothing method can mitigate this issue,
which enables stable training with large modern architectures. After smoothing,
larger models yield dramatic performance improvements for state-of-the-art
agents -- suggesting that more "easy" gains may be had by focusing on model
architectures in addition to algorithmic innovations.
Related papers
- Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z) - Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - Unlocking the Potential of Federated Learning for Deeper Models [24.875271131226707]
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients.
We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field.
These approaches can greatly improve the accuracy of FL on deeper models.
arXiv Detail & Related papers (2023-06-05T08:45:44Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - RLFlow: Optimising Neural Network Subgraph Transformation with World
Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime.
We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.