Discovering Behavioral Modes in Deep Reinforcement Learning Policies
Using Trajectory Clustering in Latent Space
- URL: http://arxiv.org/abs/2402.12939v1
- Date: Tue, 20 Feb 2024 11:50:50 GMT
- Title: Discovering Behavioral Modes in Deep Reinforcement Learning Policies
Using Trajectory Clustering in Latent Space
- Authors: Sindre Benjamin Remman and Anastasios M. Lekkas
- Abstract summary: We introduce a new approach for investigating the behavior modes of DRL policies.
Specifically, we use Pairwise Controlled Manifold Approximation Projection (PaCMAP) for dimensionality reduction and TRACLUS for trajectory clustering.
Our methodology helps identify diverse behavior patterns and suboptimal choices by the policy, thus allowing for targeted improvements.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the behavior of deep reinforcement learning (DRL) agents is
crucial for improving their performance and reliability. However, the
complexity of their policies often makes them challenging to understand. In
this paper, we introduce a new approach for investigating the behavior modes of
DRL policies, which involves utilizing dimensionality reduction and trajectory
clustering in the latent space of neural networks. Specifically, we use
Pairwise Controlled Manifold Approximation Projection (PaCMAP) for
dimensionality reduction and TRACLUS for trajectory clustering to analyze the
latent space of a DRL policy trained on the Mountain Car control task. Our
methodology helps identify diverse behavior patterns and suboptimal choices by
the policy, thus allowing for targeted improvements. We demonstrate how our
approach, combined with domain knowledge, can enhance a policy's performance in
specific regions of the state space.
Related papers
- Adaptive trajectory-constrained exploration strategy for deep
reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces.
We propose an efficient adaptive trajectory-constrained exploration strategy for DRL.
We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Policy Distillation with Selective Input Gradient Regularization for
Efficient Interpretability [6.037276428689637]
Saliency maps are frequently used to provide interpretability for deep neural networks.
Existing saliency map approaches are either computationally expensive and cannot satisfy the real-time requirement of real-world scenarios.
We propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies.
arXiv Detail & Related papers (2022-05-18T01:47:16Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation [78.17108227614928]
We propose a benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation.
We consider a value-based and policy-gradient Deep Reinforcement Learning (DRL)
We also propose a verification strategy that checks the behavior of the trained models over a set of desired properties.
arXiv Detail & Related papers (2021-12-16T16:53:56Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods [8.718494948845711]
In this paper, a novel optimal control-based baseline function is presented for the policy gradient method in deep reinforcement learning.
We validate our baseline on robot learning tasks, showing its effectiveness in guided exploration.
arXiv Detail & Related papers (2020-11-04T00:11:56Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.