Related papers: Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization

URL: http://arxiv.org/abs/2201.10003v1
Date: Fri, 21 Jan 2022 08:18:38 GMT
Title: Reinforcement Learning Your Way: Agent Characterization through Policy Regularization
Authors: Charl Maree and Christian Omlin
Abstract summary: We develop a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions. Our method guides the agents' behaviour during learning which results in an intrinsic characterization. In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increased complexity of state-of-the-art reinforcement learning (RL) algorithms have resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post-hoc explainability methods that aim to extract information from learned policies thus aiding explainability. These methods rely on empirical observations of the policy and thus aim to generalize a characterization of agents' behaviour. In this study, we have instead developed a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions. Our method guides the agents' behaviour during learning which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.

Related papers

Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning [67.90033766878962]
Self-supervised feature learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL)<n>Our work investigates MISL through the lens of identifiable representation learning.<n>We prove that Contrastive Successor Features (CSF) can provably recover the environment's ground-truth features up to a linear transformation.
arXiv Detail & Related papers (2025-07-19T20:48:46Z)
Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Inverse Reinforcement Learning from Non-Stationary Learning Agents [11.203097744443898]
We study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. We propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function.
arXiv Detail & Related papers (2024-10-18T03:02:44Z)
REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability [23.81322529587759]
REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments. We visualize the policy structure and the agent's learning process for various training tasks. A GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process.
arXiv Detail & Related papers (2024-06-20T11:29:26Z)
Vision-Language Models Provide Promptable Representations for Reinforcement Learning [67.40524195671479]
We propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied reinforcement learning (RL) We show that our approach can use chain-of-thought prompting to produce representations of common-sense semantic reasoning, improving policy performance in novel scenes by 1.5 times.
arXiv Detail & Related papers (2024-02-05T00:48:56Z)
Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning [6.622746736005175]
Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making problems. Existing DRL agents make decisions in an opaque fashion, hindering the user from establishing trust and scrutinizing weaknesses of the agents. We propose a novel method, Fidelity-Induced Policy Extraction (FIPE)
arXiv Detail & Related papers (2023-09-12T10:03:32Z)
Offline Reinforcement Learning with On-Policy Q-Function Regularization [57.09073809901382]
We deal with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
arXiv Detail & Related papers (2023-07-25T21:38:08Z)
Symbolic Explanation of Affinity-Based Reinforcement Learning Agents with Markov Models [0.0]
We develop a policy regularization method that asserts the global intrinsic affinities of learned strategies. These affinities provide a means of reasoning about a policy's behavior, thus making it inherently interpretable. We demonstrate our method in personalized prosperity management where individuals' spending behavior in time dictate their investment strategies.
arXiv Detail & Related papers (2022-08-26T12:41:06Z)
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions. By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time. Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
Policy Supervectors: General Characterization of Agents by their Behaviour [18.488655590845163]
We propose policy supervectors for characterizing agents by the distribution of states they visit. Policy supervectors can characterize policies regardless of their design philosophy and scale to thousands of policies on a single workstation machine. We demonstrate method's applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning.
arXiv Detail & Related papers (2020-12-02T14:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.