Related papers: Prior-Agnostic Incentive-Compatible Exploration

Prior-Agnostic Incentive-Compatible Exploration

URL: http://arxiv.org/abs/2602.20465v1
Date: Tue, 24 Feb 2026 01:53:08 GMT
Title: Prior-Agnostic Incentive-Compatible Exploration
Authors: Ramya Ramalingam, Osbert Bastani, Aaron Roth,
Abstract summary: In bandit settings, optimizing long-term regret metrics requires exploration.<n>We show that (weighted) swap regret bounds on their own suffice to cause agents to faithfully follow forecasts.<n>We instantiate our abstract bounds with concrete algorithms for guaranteeing adaptive and weighted regret in bandit settings.
Score: 32.22947381651758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In bandit settings, optimizing long-term regret metrics requires exploration, which corresponds to sometimes taking myopically sub-optimal actions. When a long-lived principal merely recommends actions to be executed by a sequence of different agents (as in an online recommendation platform) this provides an incentive misalignment: exploration is "worth it" for the principal but not for the agents. Prior work studies regret minimization under the constraint of Bayesian Incentive-Compatibility in a static stochastic setting with a fixed and common prior shared amongst the agents and the algorithm designer. We show that (weighted) swap regret bounds on their own suffice to cause agents to faithfully follow forecasts in an approximate Bayes Nash equilibrium, even in dynamic environments in which agents have conflicting prior beliefs and the mechanism designer has no knowledge of any agents beliefs. To obtain these bounds, it is necessary to assume that the agents have some degree of uncertainty not just about the rewards, but about their arrival time -- i.e. their relative position in the sequence of agents served by the algorithm. We instantiate our abstract bounds with concrete algorithms for guaranteeing adaptive and weighted regret in bandit settings.

Related papers

Steering No-Regret Agents in MFGs under Model Uncertainty [19.845081182511713]
We study the design of steering rewards in Mean-Field Games with density-independent transitions.<n>We establish sub-linear regret guarantees for the cumulative gaps between the agents' behaviors and the desired ones.<n>Our work presents an effective framework for steering agents behaviors in large-population systems under uncertainty.
arXiv Detail & Related papers (2025-03-12T12:02:02Z)
Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z)
Replication-proof Bandit Mechanism Design with Bayesian Agents [11.758708370032469]
We study the problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms.<n>We consider Bayesian agents who only know the distribution from which their own arms' mean rewards are sampled, unlike the original setting of by Shin et al. 2022.
arXiv Detail & Related papers (2023-12-28T08:36:35Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously.<n>We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Byzantine-Resilient Decentralized Multi-Armed Bandits [23.34196562182705]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.<n>This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z)
Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents. This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal. We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z)
Bandit Social Learning: Exploration under Myopic Behavior [54.767961587919075]
We study social learning dynamics motivated by reviews on online platforms.<n>Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.<n>We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z)
Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z)
Robust Allocations with Diversity Constraints [65.3799850959513]
We show that the Nash Welfare rule that maximizes product of agent values is uniquely positioned to be robust when diversity constraints are introduced. We also show that the guarantees achieved by Nash Welfare are nearly optimal within a widely studied class of allocation rules.
arXiv Detail & Related papers (2021-09-30T11:09:31Z)
Trustworthy Preference Completion in Social Choice [36.91054060923998]
It is impractical to ask agents to provide linear orders over all alternatives, for these partial rankings it is necessary to conduct preference completion. A trust-based anchor-kNN algorithm is proposed to find $k$-nearest trustworthy neighbors of the agent with trust-oriented Kendall-Tau distances. A certain common voting rule for the first $k$ trustworthy neighboring agents based on certainty and conflict can be taken to conduct the trustworthy preference completion.
arXiv Detail & Related papers (2020-12-14T03:03:13Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning [56.23358327635815]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.<n>A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.<n>We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z)
VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback [104.06766271716774]
We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values. We first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism. Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets.
arXiv Detail & Related papers (2020-04-19T18:00:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.