Outcome-Driven Reinforcement Learning via Variational Inference
- URL: http://arxiv.org/abs/2104.10190v1
- Date: Tue, 20 Apr 2021 18:16:21 GMT
- Title: Outcome-Driven Reinforcement Learning via Variational Inference
- Authors: Tim G. J. Rudner and Vitchyr H. Pong and Rowan McAllister and Yarin
Gal and Sergey Levine
- Abstract summary: We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
- Score: 95.82770132618862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reinforcement learning algorithms provide automated acquisition of
optimal policies, practical application of such methods requires a number of
design decisions, such as manually designing reward functions that not only
define the task, but also provide sufficient shaping to accomplish it. In this
paper, we discuss a new perspective on reinforcement learning, recasting it as
the problem of inferring actions that achieve desired outcomes, rather than a
problem of maximizing rewards. To solve the resulting outcome-directed
inference problem, we establish a novel variational inference formulation that
allows us to derive a well-shaped reward function which can be learned directly
from environment interactions. From the corresponding variational objective, we
also derive a new probabilistic Bellman backup operator reminiscent of the
standard Bellman backup operator and use it to develop an off-policy algorithm
to solve goal-directed tasks. We empirically demonstrate that this method
eliminates the need to design reward functions and leads to effective
goal-directed behaviors.
Related papers
- Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Reinforcement Learning with Non-Cumulative Objective [12.906500431427716]
In reinforcement learning, the objective is almost always defined as a emphcumulative function over the rewards along the process.
In this paper, we propose a modification to existing algorithms for optimizing such objectives.
arXiv Detail & Related papers (2023-07-11T01:20:09Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Direct Behavior Specification via Constrained Reinforcement Learning [12.679780444702573]
CMDPs can be adapted to solve goal-based tasks while adhering to a set of behavioral constraints.
We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.
arXiv Detail & Related papers (2021-12-22T21:12:28Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.