(Almost) Free Incentivized Exploration from Decentralized Learning
Agents
- URL: http://arxiv.org/abs/2110.14628v1
- Date: Wed, 27 Oct 2021 17:55:19 GMT
- Title: (Almost) Free Incentivized Exploration from Decentralized Learning
Agents
- Authors: Chengshuai Shi, Haifeng Xu, Wei Xiong, Cong Shen
- Abstract summary: Incentivized exploration in multi-armed bandits (MAB) has witnessed increasing interests and many progresses in recent years.
We study incentivized exploration with multiple and long-term strategic agents.
An important observation of this work is that strategic agents' intrinsic needs of learning benefit (instead of harming) the principal's explorations by providing "free pulls"
- Score: 27.012893220438702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incentivized exploration in multi-armed bandits (MAB) has witnessed
increasing interests and many progresses in recent years, where a principal
offers bonuses to agents to do explorations on her behalf. However, almost all
existing studies are confined to temporary myopic agents. In this work, we
break this barrier and study incentivized exploration with multiple and
long-term strategic agents, who have more complicated behaviors that often
appear in real-world applications. An important observation of this work is
that strategic agents' intrinsic needs of learning benefit (instead of harming)
the principal's explorations by providing "free pulls". Moreover, it turns out
that increasing the population of agents significantly lowers the principal's
burden of incentivizing. The key and somewhat surprising insight revealed from
our results is that when there are sufficiently many learning agents involved,
the exploration process of the principal can be (almost) free. Our main results
are built upon three novel components which may be of independent interest: (1)
a simple yet provably effective incentive-provision strategy; (2) a carefully
crafted best arm identification algorithm for rewards aggregated under unequal
confidences; (3) a high-probability finite-time lower bound of UCB algorithms.
Experimental results are provided to complement the theoretical analysis.
Related papers
- Exploration and Persuasion [58.87314871998078]
We show how to incentivize self-interested agents to explore when they prefer to exploit.
Consider a population of self-interested agents that make decisions under uncertainty.
They "explore" to acquire new information and "exploit" this information to make good decisions.
This is because exploration is costly, but its benefits are spread over many agents in the future.
arXiv Detail & Related papers (2024-10-22T15:13:13Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - Curiosity-Driven Multi-Agent Exploration with Mixed Objectives [7.247148291603988]
Intrinsic rewards have been increasingly used to mitigate the sparse reward problem in single-agent reinforcement learning.
Curiosity-driven exploration is a simple yet efficient approach that quantifies this novelty as the prediction error of the agent's curiosity module.
We show here, however, that naively using this curiosity-driven approach to guide exploration in sparse reward cooperative multi-agent environments does not consistently lead to improved results.
arXiv Detail & Related papers (2022-10-29T02:45:38Z) - Strategically Efficient Exploration in Competitive Multi-agent
Reinforcement Learning [25.041622707261897]
This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings.
We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play.
To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games.
arXiv Detail & Related papers (2021-07-30T15:22:59Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Exploration and Incentives in Reinforcement Learning [107.42240386544633]
We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
arXiv Detail & Related papers (2021-02-28T00:15:53Z) - Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon.
We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Intrinsic Exploration as Multi-Objective RL [29.124322674133]
Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse.
We propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives.
This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods.
arXiv Detail & Related papers (2020-04-06T02:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.