Understanding the Evolution of Linear Regions in Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2210.13611v1
- Date: Mon, 24 Oct 2022 21:22:12 GMT
- Title: Understanding the Evolution of Linear Regions in Deep Reinforcement
Learning
- Authors: Setareh Cohen, Nam Hee Kim, David Rolnick, Michiel van de Panne
- Abstract summary: We study how observed region counts and their densities evolve during deep reinforcement learning.
We find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy.
Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.
- Score: 21.53394095184201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policies produced by deep reinforcement learning are typically characterised
by their learning curves, but they remain poorly understood in many other
respects. ReLU-based policies result in a partitioning of the input space into
piecewise linear regions. We seek to understand how observed region counts and
their densities evolve during deep reinforcement learning using empirical
results that span a range of continuous control tasks and policy network
dimensions. Intuitively, we may expect that during training, the region density
increases in the areas that are frequently visited by the policy, thereby
affording fine-grained control. We use recent theoretical and empirical results
for the linear regions induced by neural networks in supervised learning
settings for grounding and comparison of our results. Empirically, we find that
the region density increases only moderately throughout training, as measured
along fixed trajectories coming from the final policy. However, the
trajectories themselves also increase in length during training, and thus the
region densities decrease as seen from the perspective of the current
trajectory. Our findings suggest that the complexity of deep reinforcement
learning policies does not principally emerge from a significant growth in the
complexity of functions observed on-and-around trajectories of the policy.
Related papers
- Characterizing stable regions in the residual stream of LLMs [0.0]
We identify stable regions in the residual stream of Transformers, where the model's output remains insensitive to small activation changes.
These regions emerge during training and become more defined as training progresses or model size increases.
arXiv Detail & Related papers (2024-09-25T17:27:02Z) - Discovering Behavioral Modes in Deep Reinforcement Learning Policies
Using Trajectory Clustering in Latent Space [0.0]
We introduce a new approach for investigating the behavior modes of DRL policies.
Specifically, we use Pairwise Controlled Manifold Approximation Projection (PaCMAP) for dimensionality reduction and TRACLUS for trajectory clustering.
Our methodology helps identify diverse behavior patterns and suboptimal choices by the policy, thus allowing for targeted improvements.
arXiv Detail & Related papers (2024-02-20T11:50:50Z) - A Survey Analyzing Generalization in Deep Reinforcement Learning [14.141453107129403]
We will formalize and analyze generalization in deep reinforcement learning.
We will explain the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities.
arXiv Detail & Related papers (2024-01-04T16:45:01Z) - Supported Trust Region Optimization for Offline Reinforcement Learning [59.43508325943592]
We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy.
We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset.
arXiv Detail & Related papers (2023-11-15T13:16:16Z) - Representation-Driven Reinforcement Learning [57.44609759155611]
We present a representation-driven framework for reinforcement learning.
By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation.
We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches.
arXiv Detail & Related papers (2023-05-31T14:59:12Z) - Adversarial Robust Deep Reinforcement Learning Requires Redefining
Robustness [7.6146285961466]
We show that high sensitivity directions are more abundant in the deep neural policy landscape and can be found via more natural means in a black-box setting.
We show that vanilla training techniques intriguingly result in learning more robust policies compared to the policies learnt via the state-of-the-art adversarial training techniques.
arXiv Detail & Related papers (2023-01-17T16:54:33Z) - Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning [64.14557731665577]
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL)
In this paper, we propose an efficient policy learning method in latent state and action spaces.
The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
arXiv Detail & Related papers (2022-11-23T19:09:37Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Deep Reinforcement Learning Policies Learn Shared Adversarial Features
Across MDPs [0.0]
We propose a framework to investigate the decision boundary and loss landscape similarities across states and across MDPs.
We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs.
arXiv Detail & Related papers (2021-12-16T17:10:41Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.