Reinforcement Learning Your Way: Agent Characterization through Policy
Regularization
- URL: http://arxiv.org/abs/2201.10003v1
- Date: Fri, 21 Jan 2022 08:18:38 GMT
- Title: Reinforcement Learning Your Way: Agent Characterization through Policy
Regularization
- Authors: Charl Maree and Christian Omlin
- Abstract summary: We develop a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions.
Our method guides the agents' behaviour during learning which results in an intrinsic characterization.
In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increased complexity of state-of-the-art reinforcement learning (RL)
algorithms have resulted in an opacity that inhibits explainability and
understanding. This has led to the development of several post-hoc
explainability methods that aim to extract information from learned policies
thus aiding explainability. These methods rely on empirical observations of the
policy and thus aim to generalize a characterization of agents' behaviour. In
this study, we have instead developed a method to imbue a characteristic
behaviour into agents' policies through regularization of their objective
functions. Our method guides the agents' behaviour during learning which
results in an intrinsic characterization; it connects the learning process with
model explanation. We provide a formal argument and empirical evidence for the
viability of our method. In future work, we intend to employ it to develop
agents that optimize individual financial customers' investment portfolios
based on their spending personalities.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Inverse Reinforcement Learning from Non-Stationary Learning Agents [11.203097744443898]
We study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy.
We propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function.
arXiv Detail & Related papers (2024-10-18T03:02:44Z) - REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability [23.81322529587759]
REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments.
We visualize the policy structure and the agent's learning process for various training tasks.
A GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process.
arXiv Detail & Related papers (2024-06-20T11:29:26Z) - Vision-Language Models Provide Promptable Representations for Reinforcement Learning [67.40524195671479]
We propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied reinforcement learning (RL)
We show that our approach can use chain-of-thought prompting to produce representations of common-sense semantic reasoning, improving policy performance in novel scenes by 1.5 times.
arXiv Detail & Related papers (2024-02-05T00:48:56Z) - Fidelity-Induced Interpretable Policy Extraction for Reinforcement
Learning [6.622746736005175]
Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making problems.
Existing DRL agents make decisions in an opaque fashion, hindering the user from establishing trust and scrutinizing weaknesses of the agents.
We propose a novel method, Fidelity-Induced Policy Extraction (FIPE)
arXiv Detail & Related papers (2023-09-12T10:03:32Z) - Offline Reinforcement Learning with On-Policy Q-Function Regularization [57.09073809901382]
We deal with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy.
We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
arXiv Detail & Related papers (2023-07-25T21:38:08Z) - Symbolic Explanation of Affinity-Based Reinforcement Learning Agents
with Markov Models [0.0]
We develop a policy regularization method that asserts the global intrinsic affinities of learned strategies.
These affinities provide a means of reasoning about a policy's behavior, thus making it inherently interpretable.
We demonstrate our method in personalized prosperity management where individuals' spending behavior in time dictate their investment strategies.
arXiv Detail & Related papers (2022-08-26T12:41:06Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm"
We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z) - Policy Supervectors: General Characterization of Agents by their
Behaviour [18.488655590845163]
We propose policy supervectors for characterizing agents by the distribution of states they visit.
Policy supervectors can characterize policies regardless of their design philosophy and scale to thousands of policies on a single workstation machine.
We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning.
arXiv Detail & Related papers (2020-12-02T14:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.