Exploration and Incentives in Reinforcement Learning
- URL: http://arxiv.org/abs/2103.00360v1
- Date: Sun, 28 Feb 2021 00:15:53 GMT
- Title: Exploration and Incentives in Reinforcement Learning
- Authors: Max Simchowitz, Aleksandrs Slivkins
- Abstract summary: We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
- Score: 107.42240386544633
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: How do you incentivize self-interested agents to $\textit{explore}$ when they
prefer to $\textit{exploit}$ ? We consider complex exploration problems, where
each agent faces the same (but unknown) MDP. In contrast with traditional
formulations of reinforcement learning, agents control the choice of policies,
whereas an algorithm can only issue recommendations. However, the algorithm
controls the flow of information, and can incentivize the agents to explore via
information asymmetry. We design an algorithm which explores all reachable
states in the MDP. We achieve provable guarantees similar to those for
incentivizing exploration in static, stateless exploration problems studied
previously.
Related papers
- Exploration and Persuasion [58.87314871998078]
We show how to incentivize self-interested agents to explore when they prefer to exploit.
Consider a population of self-interested agents that make decisions under uncertainty.
They "explore" to acquire new information and "exploit" this information to make good decisions.
This is because exploration is costly, but its benefits are spread over many agents in the future.
arXiv Detail & Related papers (2024-10-22T15:13:13Z) - The StarCraft Multi-Agent Challenges+ : Learning of Multi-Stage Tasks
and Environmental Factors without Precise Reward Functions [14.399479538886064]
We propose a novel benchmark called the StarCraft Multi-Agent Challenges+.
This challenge is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control.
We investigate MARL algorithms under SMAC+ and observe that recent approaches work well in similar settings to the previous challenges, but misbehave in offensive scenarios.
arXiv Detail & Related papers (2022-07-05T12:43:54Z) - SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game.
Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy.
We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z) - Learning in Sparse Rewards settings through Quality-Diversity algorithms [1.4881159885040784]
This thesis focuses on the problem of sparse rewards with Quality-Diversity (QD) algorithms.
The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated.
The thesis continues with the introduction of the SERENE algorithm, a method that can efficiently focus on the interesting parts of the search space.
arXiv Detail & Related papers (2022-03-02T11:02:34Z) - Exploring More When It Needs in Deep Reinforcement Learning [3.442899929543427]
We propose a mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N)
We use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more.
We apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.
arXiv Detail & Related papers (2021-09-28T04:29:38Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.