Related papers: Learning Dynamics and Generalization in Reinforcement Learning

Learning Dynamics and Generalization in Reinforcement Learning

URL: http://arxiv.org/abs/2206.02126v1
Date: Sun, 5 Jun 2022 08:49:16 GMT
Title: Learning Dynamics and Generalization in Reinforcement Learning
Authors: Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal
Abstract summary: We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
Score: 59.530058000689884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly initialized networks and networks trained with policy gradient methods. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization to novel environments in the ProcGen suite and improves robustness to input perturbations.

Related papers

A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning [7.923577336744156]
We propose a dual-agent adversarial policy learning framework. This framework allows agents to spontaneously learn the underlying semantics without introducing any human prior knowledge. Experiments show that the adversarial process significantly improves the generalization performance of both agents.
arXiv Detail & Related papers (2025-01-29T02:36:47Z)
Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks. We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations. We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z)
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory. Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z)
On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point. Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness [79.47619798416194]
Learn2Perturb is an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. Inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively.
arXiv Detail & Related papers (2020-03-02T18:27:35Z)
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.