Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
- URL: http://arxiv.org/abs/2205.13930v1
- Date: Fri, 27 May 2022 12:12:56 GMT
- Title: Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
- Authors: Siddharth Barman, Arindam Khan, Arnab Maiti and Ayush Sawarni
- Abstract summary: We extend the notion of regret with a welfarist perspective.
We study Nash regret, defined as the difference between the -- a priori unknown -- optimal mean (among the arms) and the algorithm's performance.
- Score: 13.094997642327371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We extend the notion of regret with a welfarist perspective. Focussing on the
classic multi-armed bandit (MAB) framework, the current work quantifies the
performance of bandit algorithms by applying a fundamental welfare function,
namely the Nash social welfare (NSW) function. This corresponds to equating
algorithm's performance to the geometric mean of its expected rewards and leads
us to the study of Nash regret, defined as the difference between the -- a
priori unknown -- optimal mean (among the arms) and the algorithm's
performance. Since NSW is known to satisfy fairness axioms, our approach
complements the utilitarian considerations of average (cumulative) regret,
wherein the algorithm is evaluated via the arithmetic mean of its expected
rewards.
This work develops an algorithm that, given the horizon of play $T$, achieves
a Nash regret of $O \left( \sqrt{\frac{{k \log T}}{T}} \right)$, here $k$
denotes the number of arms in the MAB instance. Since, for any algorithm, the
Nash regret is at least as much as its average regret (the AM-GM inequality),
the known lower bound on average regret holds for Nash regret as well.
Therefore, our Nash regret guarantee is essentially tight. In addition, we
develop an anytime algorithm with a Nash regret guarantee of $O \left(
\sqrt{\frac{{k\log T}}{T}} \log T \right)$.
Related papers
- No-Regret Learning for Fair Multi-Agent Social Welfare Optimization [35.44934658941197]
We consider the problem of online multi-agent Nash social welfare (NSW)
We show that no algorithm can achieve sublinear regret.
We also show that logarithmic regret is possible whenever there exists one agent who is indifferent about different arms.
arXiv Detail & Related papers (2024-05-31T08:21:11Z) - Incentive-compatible Bandits: Importance Weighting No More [14.344759978208957]
We study the problem of incentive-compatible online learning with bandit feedback.
In this work we propose the first incentive-compatible algorithms that enjoy $O(sqrtKT)$ regret bounds.
arXiv Detail & Related papers (2024-05-10T13:57:13Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Nash Regret Guarantees for Linear Bandits [12.494184403263338]
We develop an algorithm that achieves a Nash regret of $Oleft( sqrtfracdnuT log( T |X|)right)$.
In addition, addressing linear bandit instances in which the set of arms $X$ is not necessarily finite, we obtain a Nash regret upper bound $Oleft( fracdfrac54nufrac12sqrtT log(T)right)$.
arXiv Detail & Related papers (2023-10-03T12:58:10Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Optimal Order Simple Regret for Gaussian Process Bandits [6.84224661918159]
We show that a pure exploration algorithm is significantly tighter than the existing bounds.
We show that this regret is optimal up to logarithmically for cases where a lower bound on a kernel is known.
arXiv Detail & Related papers (2021-08-20T16:49:32Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.