On the Expressivity of Markov Reward
- URL: http://arxiv.org/abs/2111.00876v1
- Date: Mon, 1 Nov 2021 12:12:16 GMT
- Title: On the Expressivity of Markov Reward
- Authors: David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L.
Littman, Doina Precup, Satinder Singh
- Abstract summary: This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform.
We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories.
- Score: 89.96685777114456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward is the driving force for reinforcement-learning agents. This paper is
dedicated to understanding the expressivity of reward as a way to capture tasks
that we would want an agent to perform. We frame this study around three new
abstract notions of "task" that might be desirable: (1) a set of acceptable
behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering
over trajectories. Our main results prove that while reward can express many of
these tasks, there exist instances of each task type that no Markov reward
function can capture. We then provide a set of polynomial-time algorithms that
construct a Markov reward function that allows an agent to optimize tasks of
each of these three types, and correctly determine when no such reward function
exists. We conclude with an empirical study that corroborates and illustrates
our theoretical findings.
Related papers
- Multi Task Inverse Reinforcement Learning for Common Sense Reward [21.145179791929337]
We show that inverse reinforcement learning, even when it succeeds in training an agent, does not learn a useful reward function.
That is, training a new agent with the learned reward does not impair the desired behaviors.
That is, multi-task inverse reinforcement learning can be applied to learn a useful reward function.
arXiv Detail & Related papers (2024-02-17T19:49:00Z) - Informativeness of Reward Functions in Reinforcement Learning [34.40155383189179]
We study the problem of designing informative reward functions so that the designed rewards speed up the agent's convergence.
Existing works have considered several different reward design formulations.
We propose a reward informativeness criterion that adapts w.r.t. the agent's current policy and can be optimized under specified structural constraints.
arXiv Detail & Related papers (2024-02-10T18:36:42Z) - Reward Bonuses with Gain Scheduling Inspired by Iterative Deepening
Search [8.071506311915396]
This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search.
Various bonuses have been designed to date, they are analogous to the depth-first and breadth-first search algorithms in graph theory.
A gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms.
arXiv Detail & Related papers (2022-12-21T04:52:13Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Identifiability in inverse reinforcement learning [0.0]
Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem.
We provide a resolution to this non-identifiability for problems with entropy regularization.
arXiv Detail & Related papers (2021-06-07T10:35:52Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.