Related papers: Exploration and Incentives in Reinforcement Learning

Exploration and Incentives in Reinforcement Learning

URL: http://arxiv.org/abs/2103.00360v1
Date: Sun, 28 Feb 2021 00:15:53 GMT
Title: Exploration and Incentives in Reinforcement Learning
Authors: Max Simchowitz, Aleksandrs Slivkins
Abstract summary: We consider complex exploration problems, where each agent faces the same (but unknown) MDP. Agents control the choice of policies, whereas an algorithm can only issue recommendations. We design an algorithm which explores all reachable states in the MDP.
Score: 107.42240386544633
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$ ? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously.

Related papers

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents [16.514561132180134]
We study the repeated principal-agent bandit game, where the principal indirectly interacts with the unknown environment by proposing incentives for the agent to play arms. Most existing work assumes the agent has full knowledge of the reward means and always behaves greedily, but in many online marketplaces, the agent needs to learn the unknown environment and sometimes explore. Motivated by such settings, we model a self-interested learning agent with exploration behaviors who iteratively updates reward estimates and either selects an arm that maximizes the estimated reward plus incentive or explores arbitrarily with a certain probability.
arXiv Detail & Related papers (2024-12-20T20:04:50Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Exploration and Persuasion [58.87314871998078]
We show how to incentivize self-interested agents to explore when they prefer to exploit. Consider a population of self-interested agents that make decisions under uncertainty. They "explore" to acquire new information and "exploit" this information to make good decisions. This is because exploration is costly, but its benefits are spread over many agents in the future.
arXiv Detail & Related papers (2024-10-22T15:13:13Z)
The StarCraft Multi-Agent Challenges+ : Learning of Multi-Stage Tasks and Environmental Factors without Precise Reward Functions [14.399479538886064]
We propose a novel benchmark called the StarCraft Multi-Agent Challenges+. This challenge is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control. We investigate MARL algorithms under SMAC+ and observe that recent approaches work well in similar settings to the previous challenges, but misbehave in offensive scenarios.
arXiv Detail & Related papers (2022-07-05T12:43:54Z)
SEREN: Knowing When to Explore and When to Exploit [14.188362393915432]
We introduce Sive Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game. Using a form of policies known as impulse control, switcher is able to determine the best set of states to switch to the exploration policy. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation.
arXiv Detail & Related papers (2022-05-30T12:44:56Z)
Learning in Sparse Rewards settings through Quality-Diversity algorithms [1.4881159885040784]
This thesis focuses on the problem of sparse rewards with Quality-Diversity (QD) algorithms. The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated. The thesis continues with the introduction of the SERENE algorithm, a method that can efficiently focus on the interesting parts of the search space.
arXiv Detail & Related papers (2022-03-02T11:02:34Z)
Exploring More When It Needs in Deep Reinforcement Learning [3.442899929543427]
We propose a mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N) We use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more. We apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.
arXiv Detail & Related papers (2021-09-28T04:29:38Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.