Improved Regret Bounds for Online Fair Division with Bandit Learning
- URL: http://arxiv.org/abs/2501.07022v1
- Date: Mon, 13 Jan 2025 02:48:58 GMT
- Title: Improved Regret Bounds for Online Fair Division with Bandit Learning
- Authors: Benjamin Schiffer, Shirley Zhang,
- Abstract summary: We study online fair division when there are a finite number of item types and the player values for the items are drawn randomly from distributions with unknown means.
In this setting, a sequence of indivisible items arrives according to a random online process, and each item must be allocated to a single player.
We show that it is possible to with high probability guarantee proportionality constraint satisfaction and achieve $tildeO(sqrtT)$ regret.
- Score: 0.0
- License:
- Abstract: We study online fair division when there are a finite number of item types and the player values for the items are drawn randomly from distributions with unknown means. In this setting, a sequence of indivisible items arrives according to a random online process, and each item must be allocated to a single player. The goal is to maximize expected social welfare while maintaining that the allocation satisfies proportionality in expectation. When player values are normalized, we show that it is possible to with high probability guarantee proportionality constraint satisfaction and achieve $\tilde{O}(\sqrt{T})$ regret. To achieve this result, we present an upper confidence bound (UCB) algorithm that uses two rounds of linear optimization. This algorithm highlights fundamental aspects of proportionality constraints that allow for a UCB algorithm despite the presence of many (potentially tight) constraints. This result improves upon the previous best regret rate of $\tilde{O}(T^{2/3})$.
Related papers
- Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.
This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.
Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z) - Honor Among Bandits: No-Regret Learning for Online Fair Division [20.38824614301761]
We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means.
Our main result is the design of an explore-then-commit algorithm that achieves $tildeO(T2/3)$ regret while maintaining either fairness constraint.
arXiv Detail & Related papers (2024-07-01T20:44:52Z) - No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization [26.415300249303748]
We show that it is possible to circumvent the issue of sublinear violations of constraints by requiring the primal and dual algorithm to be weakly adaptive.
In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $rho/(1+rho)$.
This results allow us to obtain new result for the problem of contextual bandits with linear constraints.
arXiv Detail & Related papers (2024-05-10T16:22:33Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Contextual Bandits with Stage-wise Constraints [46.412612374393895]
We study contextual bandits in the presence of a stage-wise constraint (a constraint at each round)
We propose an upper-confidence bound algorithm for the problem and prove a $T$-round regret bound for it.
arXiv Detail & Related papers (2024-01-15T23:58:21Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Differentiable Linear Bandit Algorithm [6.849358422233866]
Upper Confidence Bound is arguably the most commonly used method for linear multi-arm bandit problems.
We introduce a gradient estimator, which allows the confidence bound to be learned via gradient ascent.
We show that the proposed algorithm achieves a $tildemathcalO(hatbetasqrtdT)$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $hatbeta$ is the learned size of confidence bound.
arXiv Detail & Related papers (2020-06-04T16:43:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.