Exploration and Incentives in Reinforcement Learning
- URL: http://arxiv.org/abs/2103.00360v1
- Date: Sun, 28 Feb 2021 00:15:53 GMT
- Title: Exploration and Incentives in Reinforcement Learning
- Authors: Max Simchowitz, Aleksandrs Slivkins
- Abstract summary: We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
- Score: 107.42240386544633
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: How do you incentivize self-interested agents to $\textit{explore}$ when they
prefer to $\textit{exploit}$ ? We consider complex exploration problems, where
each agent faces the same (but unknown) MDP. In contrast with traditional
formulations of reinforcement learning, agents control the choice of policies,
whereas an algorithm can only issue recommendations. However, the algorithm
controls the flow of information, and can incentivize the agents to explore via
information asymmetry. We design an algorithm which explores all reachable
states in the MDP. We achieve provable guarantees similar to those for
incentivizing exploration in static, stateless exploration problems studied
previously.
Related papers
- Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents [16.514561132180134]
We study the repeated principal-agent bandit game, where the principal indirectly interacts with the unknown environment by proposing incentives for the agent to play arms.
Most existing work assumes the agent has full knowledge of the reward means and always behaves greedily, but in many online marketplaces, the agent needs to learn the unknown environment and sometimes explore.
Motivated by such settings, we model a self-interested learning agent with exploration behaviors who iteratively updates reward estimates and either selects an arm that maximizes the estimated reward plus incentive or explores arbitrarily with a certain probability.
arXiv Detail & Related papers (2024-12-20T20:04:50Z) - MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Exploration and Persuasion [58.87314871998078]
We show how to incentivize self-interested agents to explore when they prefer to exploit.
Consider a population of self-interested agents that make decisions under uncertainty.
They "explore" to acquire new information and "exploit" this information to make good decisions.
This is because exploration is costly, but its benefits are spread over many agents in the future.
arXiv Detail & Related papers (2024-10-22T15:13:13Z) - SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game.
Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy.
We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z) - Learning in Sparse Rewards settings through Quality-Diversity algorithms [1.4881159885040784]
This thesis focuses on the problem of sparse rewards with Quality-Diversity (QD) algorithms.
The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated.
The thesis continues with the introduction of the SERENE algorithm, a method that can efficiently focus on the interesting parts of the search space.
arXiv Detail & Related papers (2022-03-02T11:02:34Z) - Exploring More When It Needs in Deep Reinforcement Learning [3.442899929543427]
We propose a mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N)
We use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more.
We apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.
arXiv Detail & Related papers (2021-09-28T04:29:38Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.