Related papers: Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays

Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays

URL: http://arxiv.org/abs/2407.15439v1
Date: Mon, 22 Jul 2024 07:36:27 GMT
Title: Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays
Authors: Ziqun Chen, Kechao Cai, Zhuoyue Chen, Jinbei Zhang, John C. S. Lui,
Abstract summary: We study the semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available. We present new bandit algorithms to select arms under unrestricted feedback delays based on their merits.
Score: 25.757803459592104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.

Related papers

Biased Dueling Bandits with Stochastic Delayed Feedback [6.167074802065416]
We present two algorithms designed to handle situations involving delay. Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay. The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available.
arXiv Detail & Related papers (2024-08-26T19:49:12Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application [17.64363983613468]
We formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3. In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution.
arXiv Detail & Related papers (2023-10-17T12:08:15Z)
Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards [7.0997346625024]
We formalize a non-stationary and delayed semi-bandit problem with causally related rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making. We evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
arXiv Detail & Related papers (2023-07-18T09:22:33Z)
Delayed Feedback in Generalised Linear Bandits Revisited [5.349852254138085]
We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon.
arXiv Detail & Related papers (2022-07-21T23:35:01Z)
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB) This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z)
Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits [66.02233330016435]
We provide problem-dependent guarantees on both the regret and the asked feedback. We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
arXiv Detail & Related papers (2021-10-12T03:24:57Z)
Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.