The price of unfairness in linear bandits with biased feedback
- URL: http://arxiv.org/abs/2203.09784v1
- Date: Fri, 18 Mar 2022 08:03:20 GMT
- Title: The price of unfairness in linear bandits with biased feedback
- Authors: Solenne Gaucher (LMO, CELESTE), Alexandra Carpentier, Christophe
Giraud (LMO, CELESTE)
- Abstract summary: We study the problem of sequential decision making with biased linear bandit feedback.
We show that the worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback.
Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
- Score: 62.25313751895011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence is increasingly used in a wide range of decision
making scenarios with higher and higher stakes. At the same time, recent work
has highlighted that these algorithms can be dangerously biased, and that their
results often need to be corrected to avoid leading to unfair decisions. In
this paper, we study the problem of sequential decision making with biased
linear bandit feedback. At each round, a player selects an action described by
a covariate and by a sensitive attribute. She receives a reward corresponding
to the covariates of the action that she has chosen, but only observe a biased
evaluation of this reward, where the bias depends on the sensitive attribute.
To tackle this problem, we design a Fair Phased Elimination algorithm. We
establish an upper bound on its worst-case regret, showing that it is smaller
than C$\kappa$ 1/3 * log(T) 1/3 T 2/3 , where C is a numerical constant and
$\kappa$ * an explicit geometrical constant characterizing the difficulty of
bias estimation. The worst case regret is higher than the dT 1/2 log(T) regret
rate obtained under unbiased feedback. We show that this rate cannot be
improved for all instances : we obtain lower bounds on the worst-case regret
for some sets of actions showing that this rate is tight up to a
sub-logarithmic factor. We also obtain gap-dependent upper bounds on the
regret, and establish matching lower bounds for some problem instance.
Interestingly, the gap-dependent rates reveal the existence of non-trivial
instances where the problem is no more difficult than its unbiased counterpart.
Related papers
- An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints [29.596684377841182]
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged.
We introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards.
arXiv Detail & Related papers (2024-04-22T14:11:54Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs [32.29254118429081]
We show that the optimal mistake bound under bandit feedback is at most $O(k)$ times higher than the optimal mistake bound in the full information case.
We also present nearly optimal bounds of $tildeTheta(k)$ on the gap between randomized and deterministic learners.
arXiv Detail & Related papers (2024-02-12T07:20:05Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Minimax Regret for Stochastic Shortest Path with Adversarial Costs and
Known Transition [37.6975819766632]
We study the shortest path problem with adversarial costs and known transition.
We show that the minimax regret is $widetildeO(sqrtDTstar K)$ and $widetildeO(sqrtDTstar SA K)$ for the full-information setting and the bandit feedback setting.
arXiv Detail & Related papers (2020-12-07T20:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.