Related papers: Towards Deeper Deep Reinforcement Learning

Towards Deeper Deep Reinforcement Learning

URL: http://arxiv.org/abs/2106.01151v1
Date: Wed, 2 Jun 2021 13:41:02 GMT
Title: Towards Deeper Deep Reinforcement Learning
Authors: Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger
Abstract summary: In computer vision and natural language processing, state-of-the-art reinforcement learning algorithms often use only small intrinsics. We show that dataset size is not the limiting factor, and instead argue that instability from the actor in SAC taking gradients through the critic is the culprit.
Score: 42.960199987696306
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In computer vision and natural language processing, innovations in model architecture that lead to increases in model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use only small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datasets in RL necessitate simple models to avoid overfitting; however, this hypothesis is untested. In this paper we investigate how RL agents are affected by exchanging the small MLPs with larger modern networks with skip connections and normalization, focusing specifically on soft actor-critic (SAC) algorithms. We verify, empirically, that na\"ively adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that intrinsic instability from the actor in SAC taking gradients through the critic is the culprit. We demonstrate that a simple smoothing method can mitigate this issue, which enables stable training with large modern architectures. After smoothing, larger models yield dramatic performance improvements for state-of-the-art agents -- suggesting that more "easy" gains may be had by focusing on model architectures in addition to algorithmic innovations.

Related papers

Normalizing Flows are Capable Models for RL [24.876149287707847]
We propose a single Normalizing Flow architecture which integrates seamlessly into reinforcement learning algorithms.<n>Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.
arXiv Detail & Related papers (2025-05-29T15:06:22Z)
Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening. Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training. We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z)
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme [0.0]
Emergence in machine learning refers to the spontaneous appearance of capabilities that arise from the scale and structure of training data. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization.
arXiv Detail & Related papers (2024-07-26T18:56:47Z)
Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z)
Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic. We first benchmark the performance of these models under different perturbations and datasets. We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z)
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z)
Unlocking the Potential of Federated Learning for Deeper Models [24.875271131226707]
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients. We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field. These approaches can greatly improve the accuracy of FL on deeper models.
arXiv Detail & Related papers (2023-06-05T08:45:44Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
RLFlow: Optimising Neural Network Subgraph Transformation with World Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime. We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z)
Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage. We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
The Self-Simplifying Machine: Exploiting the Structure of Piecewise Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks. Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training. On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.