Policy Supervectors: General Characterization of Agents by their
Behaviour
- URL: http://arxiv.org/abs/2012.01244v1
- Date: Wed, 2 Dec 2020 14:43:16 GMT
- Title: Policy Supervectors: General Characterization of Agents by their
Behaviour
- Authors: Anssi Kanervisto, Tomi Kinnunen, Ville Hautam\"aki
- Abstract summary: We propose policy supervectors for characterizing agents by the distribution of states they visit.
Policy supervectors can characterize policies regardless of their design philosophy and scale to thousands of policies on a single workstation machine.
We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning.
- Score: 18.488655590845163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By studying the underlying policies of decision-making agents, we can learn
about their shortcomings and potentially improve them. Traditionally, this has
been done either by examining the agent's implementation, its behaviour while
it is being executed, its performance with a reward/fitness function or by
visualizing the density of states the agent visits. However, these methods fail
to describe the policy's behaviour in complex, high-dimensional environments or
do not scale to thousands of policies, which is required when studying training
algorithms. We propose policy supervectors for characterizing agents by the
distribution of states they visit, adopting successful techniques from the area
of speech technology. Policy supervectors can characterize policies regardless
of their design philosophy (e.g. rule-based vs. neural networks) and scale to
thousands of policies on a single workstation machine. We demonstrate method's
applicability by studying the evolution of policies during reinforcement
learning, evolutionary training and imitation learning, providing insight on
e.g. how the search space of evolutionary algorithms is also reflected in
agent's behaviour, not just in the parameters.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Reinforcement Learning Your Way: Agent Characterization through Policy
Regularization [0.0]
We develop a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions.
Our method guides the agents' behaviour during learning which results in an intrinsic characterization.
In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.
arXiv Detail & Related papers (2022-01-21T08:18:38Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Preventing Imitation Learning with Adversarial Policy Ensembles [79.81807680370677]
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy.
How can we protect against external observers cloning our proprietary policies?
We introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies.
arXiv Detail & Related papers (2020-01-31T01:57:16Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.