Knowledge is reward: Learning optimal exploration by predictive reward
cashing
- URL: http://arxiv.org/abs/2109.08518v1
- Date: Fri, 17 Sep 2021 12:52:24 GMT
- Title: Knowledge is reward: Learning optimal exploration by predictive reward
cashing
- Authors: Luca Ambrogioni
- Abstract summary: We exploit the inherent mathematical structure of Bayes-adaptive problems in order to dramatically simplify the problem.
The key to this simplification comes from the novel concept of cross-value.
This results in a new denser reward structure that "cashes in" all future rewards that can be predicted from the current information state.
- Score: 5.279475826661643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a strong link between the general concept of intelligence and the
ability to collect and use information. The theory of Bayes-adaptive
exploration offers an attractive optimality framework for training machines to
perform complex information gathering tasks. However, the computational
complexity of the resulting optimal control problem has limited the diffusion
of the theory to mainstream deep AI research. In this paper we exploit the
inherent mathematical structure of Bayes-adaptive problems in order to
dramatically simplify the problem by making the reward structure denser while
simultaneously decoupling the learning of exploitation and exploration
policies. The key to this simplification comes from the novel concept of
cross-value (i.e. the value of being in an environment while acting optimally
according to another), which we use to quantify the value of currently
available information. This results in a new denser reward structure that
"cashes in" all future rewards that can be predicted from the current
information state. In a set of experiments we show that the approach makes it
possible to learn challenging information gathering tasks without the use of
shaping and heuristic bonuses in situations where the standard RL algorithms
fail.
Related papers
- Accelerating Exploration with Unlabeled Prior Data [66.43995032226466]
We study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task.
We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization.
arXiv Detail & Related papers (2023-11-09T00:05:17Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient.
We propose a model-based reinforcement learning framework that learns an active feature acquisition policy.
Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z) - Learning Guidance Rewards with Trajectory-space Smoothing [22.456737935789103]
Long-term temporal credit assignment is an important challenge in deep reinforcement learning.
Existing policy-gradient and Q-learning algorithms rely on dense environmental rewards that provide rich short-term supervision.
Recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards.
arXiv Detail & Related papers (2020-10-23T23:55:06Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.