Envious Explore and Exploit
- URL: http://arxiv.org/abs/2502.12798v1
- Date: Tue, 18 Feb 2025 12:00:35 GMT
- Title: Envious Explore and Exploit
- Authors: Omer Ben-Porat, Yotam Gafni, Or Markovetzki,
- Abstract summary: We study the societal effects of explore-and-exploit mechanisms using the economic notion of envy.
We present a multi-armed bandit-like model in which every round consists of several sessions, and rewards are realized once per round.
On the downside, doing so also generates envy, as late-to-arrive users enjoy the information gathered by early-to-arrive users.
- Score: 8.029049649310213
- License:
- Abstract: Explore-and-exploit tradeoffs play a key role in recommendation systems (RSs), aiming at serving users better by learning from previous interactions. Despite their commercial success, the societal effects of explore-and-exploit mechanisms are not well understood, especially regarding the utility discrepancy they generate between different users. In this work, we measure such discrepancy using the economic notion of envy. We present a multi-armed bandit-like model in which every round consists of several sessions, and rewards are realized once per round. We call the latter property reward consistency, and show that the RS can leverage this property for better societal outcomes. On the downside, doing so also generates envy, as late-to-arrive users enjoy the information gathered by early-to-arrive users. We examine the generated envy under several arrival order mechanisms and virtually any anonymous algorithm, i.e., any algorithm that treats all similar users similarly without leveraging their identities. We provide tight envy bounds on uniform arrival and upper bound the envy for nudged arrival, in which the RS can affect the order of arrival by nudging its users. Furthermore, we study the efficiency-fairness trade-off by devising an algorithm that allows constant envy and approximates the optimal welfare in restricted settings. Finally, we validate our theoretical results empirically using simulations.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Adversarial Bandits with Multi-User Delayed Feedback: Theory and
Application [17.64363983613468]
We formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3.
In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution.
arXiv Detail & Related papers (2023-10-17T12:08:15Z) - Leveraging heterogeneous spillover in maximizing contextual bandit rewards [10.609670658904562]
We present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers.
Our framework leads to significantly higher rewards than existing state-of-the-art solutions.
arXiv Detail & Related papers (2023-10-16T10:34:41Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [74.21019737169675]
Differentiable architecture search suffers from long-standing performance instability.
indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the performance collapses.
In this paper, we undertake a more subtle and direct approach to resolve the collapse.
arXiv Detail & Related papers (2020-09-02T12:54:13Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.