Related papers: Contextual Multinomial Logit Bandits with General Value Functions

Contextual Multinomial Logit Bandits with General Value Functions

URL: http://arxiv.org/abs/2402.08126v2
Date: Sun, 18 Feb 2024 20:32:08 GMT
Title: Contextual Multinomial Logit Bandits with General Value Functions
Authors: Mengxiao Zhang, Haipeng Luo
Abstract summary: Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. We consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the computation and the adversarial settings, and propose a suite of algorithms, each with different-regret trade-off.
Score: 47.06746975822902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

Related papers

Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions [8.48717433940334]
We introduce a new problem of generalized linear bandits with unknown reward functions, also known as single index bandits.<n>We first consider the case where the unknown reward function is monotonically increasing, and propose two novel and efficient algorithms, STOR and ESTOR.<n>We then extend our methods to the high-dimensional sparse setting and show that the same regret rate can be attained with the sparsity index.
arXiv Detail & Related papers (2025-06-15T07:19:00Z)
Sparse Nonparametric Contextual Bandits [2.0072624123275533]
We study the problem of simultaneously learning relevant features and minimising regret in contextual bandit problems. We introduce and analyse a new class of contextual bandit problems, called sparse nonparametric contextual bandits. We find that sparsity always enables better regret bounds, as long as the horizon is large enough relative to the sparsity and the number of actions.
arXiv Detail & Related papers (2025-03-20T17:44:56Z)
Contextual Online Decision Making with Infinite-Dimensional Functional Regression [19.06054415343443]
Contextual sequential decision-making problems play a crucial role in machine learning. We provide a universal admissible algorithm framework for dealing with all kinds of contextual online decision-making problems.
arXiv Detail & Related papers (2025-01-30T14:05:20Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
A conversion theorem and minimax optimality for continuum contextual bandits [70.71582850199871]
We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret. We show that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret.
arXiv Detail & Related papers (2024-06-09T10:12:08Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
On the Optimal Regret of Locally Private Linear Contextual Bandit [18.300225068036642]
We show that it is possible to achieve an $tilde O(sqrtT)$ regret upper bound for locally private linear contextual bandit. Our solution relies on several new algorithmic and analytical ideas.
arXiv Detail & Related papers (2024-04-15T02:00:24Z)
$\alpha$-Fair Contextual Bandits [10.74025233418392]
Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection. One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round. In this paper, we consider the $alpha$-Fairtextual Con Bandits problem, where the objective is to maximize the global $alpha$-fair utility function.
arXiv Detail & Related papers (2023-10-22T03:42:59Z)
Thompson Exploration with Best Challenger Rule in Best Arm Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework. We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z)
Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles [14.634964681825197]
We study the contextual bandit with knapsacks (CBwK) problem, where each action leads to a random reward but also costs a random resource consumption in a vector form. We propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression.
arXiv Detail & Related papers (2022-10-21T09:28:53Z)
Contextual bandits with concave rewards, and an application to fair ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR) We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem. Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z)
On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function. We propose a general framework for both algorithm performance analysis. We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z)
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z)
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression. Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.