$\alpha$-Fair Contextual Bandits
- URL: http://arxiv.org/abs/2310.14164v1
- Date: Sun, 22 Oct 2023 03:42:59 GMT
- Title: $\alpha$-Fair Contextual Bandits
- Authors: Siddhant Chaudhary and Abhishek Sinha
- Abstract summary: Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection.
One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round.
In this paper, we consider the $alpha$-Fairtextual Con Bandits problem, where the objective is to maximize the global $alpha$-fair utility function.
- Score: 10.74025233418392
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Contextual bandit algorithms are at the core of many applications, including
recommender systems, clinical trials, and optimal portfolio selection. One of
the most popular problems studied in the contextual bandit literature is to
maximize the sum of the rewards in each round by ensuring a sublinear regret
against the best-fixed context-dependent policy. However, in many applications,
the cumulative reward is not the right objective - the bandit algorithm must be
fair in order to avoid the echo-chamber effect and comply with the regulatory
requirements. In this paper, we consider the $\alpha$-Fair Contextual Bandits
problem, where the objective is to maximize the global $\alpha$-fair utility
function - a non-decreasing concave function of the cumulative rewards in the
adversarial setting. The problem is challenging due to the non-separability of
the objective across rounds. We design an efficient algorithm that guarantees
an approximately sublinear regret in the full-information and bandit feedback
settings.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Optimal Contextual Bandits with Knapsacks under Realizibility via
Regression Oracles [14.634964681825197]
We study the contextual bandit with knapsacks (CBwK) problem, where each action leads to a random reward but also costs a random resource consumption in a vector form.
We propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression.
arXiv Detail & Related papers (2022-10-21T09:28:53Z) - Contextual bandits with concave rewards, and an application to fair
ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR)
We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem.
Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z) - Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields.
We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards.
We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z) - Contextual Bandits with Smooth Regret: Efficient Learning in Continuous
Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces.
We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives.
Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z) - Risk-Aware Algorithms for Combinatorial Semi-Bandits [7.716156977428555]
We study the multi-armed bandit problem under semi-bandit feedback.
We consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards.
We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the bandit.
arXiv Detail & Related papers (2021-12-02T11:29:43Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.