Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model
- URL: http://arxiv.org/abs/2303.08613v2
- Date: Sun, 6 Aug 2023 19:25:02 GMT
- Title: Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model
- Authors: Siyu Chen, Jibang Wu, Yifan Wu, Zhuoran Yang
- Abstract summary: We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
- Score: 64.94131130042275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the incentivized information acquisition problem, where a principal
hires an agent to gather information on her behalf. Such a problem is modeled
as a Stackelberg game between the principal and the agent, where the principal
announces a scoring rule that specifies the payment, and then the agent then
chooses an effort level that maximizes her own profit and reports the
information. We study the online setting of such a problem from the principal's
perspective, i.e., designing the optimal scoring rule by repeatedly interacting
with the strategic agent. We design a provably sample efficient algorithm that
tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a
sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a
delicate estimation procedure for the optimal profit of the principal, and a
conservative correction scheme that ensures the desired agent's actions are
incentivized. Furthermore, a key feature of our regret bound is that it is
independent of the number of states of the environment.
Related papers
- Contractual Reinforcement Learning: Pulling Arms with Invisible Hands [68.77645200579181]
We propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design.
For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent.
For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation.
arXiv Detail & Related papers (2024-07-01T16:53:00Z) - Incentivized Learning in Principal-Agent Bandit Games [62.41639598376539]
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
The principal can influence the agent's decisions by offering incentives which add up to his rewards.
We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
arXiv Detail & Related papers (2024-03-06T16:00:46Z) - Principal-Agent Reward Shaping in MDPs [50.914110302917756]
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
arXiv Detail & Related papers (2023-12-30T18:30:44Z) - Regret Analysis of Repeated Delegated Choice [8.384985977301174]
We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18.
We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or utility.
arXiv Detail & Related papers (2023-10-07T17:54:36Z) - Learning Optimal Contracts: How to Exploit Small Action Spaces [37.92189925462977]
We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme.
We design an algorithm that learns an approximately-optimal contract with high probability.
It can also be employed to provide a $tildemathcalO(T4/5)$ regret bound in the related online learning setting.
arXiv Detail & Related papers (2023-09-18T14:18:35Z) - Online Learning with Costly Features in Non-stationary Environments [6.009759445555003]
In sequential decision-making problems, maximizing long-term rewards is the primary goal.
In real-world problems, collecting beneficial information is often costly.
We develop an algorithm that guarantees a sublinear regret in time.
arXiv Detail & Related papers (2023-07-18T16:13:35Z) - Strategic Apple Tasting [35.25249063553063]
Algorithmic decision-making in high-stakes domains often involves assigning decisions to agents with incentives to strategically modify their input to the algorithm.
We formalize this setting as an online learning problem with apple-tasting feedback.
Our goal is to achieve sublinear strategic regret, which compares the performance of the principal to that of the best fixed policy in hindsight.
arXiv Detail & Related papers (2023-06-09T20:46:31Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Learning in Stackelberg Games with Non-myopic Agents [60.927889817803745]
We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent's payoff function.
We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents.
arXiv Detail & Related papers (2022-08-19T15:49:30Z) - Consequences of Misaligned AI [12.879600368339393]
This paper argues that we should view the design of reward functions as an interactive and dynamic process.
We show how modifying the setup to allow reward functions that reference the full state or allowing the principal to update the proxy objective over time can lead to higher utility solutions.
arXiv Detail & Related papers (2021-02-07T19:34:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.