Related papers: Multi-facet Contextual Bandits: A Neural Network Perspective

Multi-facet Contextual Bandits: A Neural Network Perspective

URL: http://arxiv.org/abs/2106.03039v1
Date: Sun, 6 Jun 2021 05:48:44 GMT
Title: Multi-facet Contextual Bandits: A Neural Network Perspective
Authors: Yikun Ban, Jingrui He, Curtiss B. Cook
Abstract summary: We study a novel problem of multi-facet bandits involving a group of bandits, each characterizing the users' needs from one unique aspect. In each round, for the given user, we need to select one arm from each bandit, such that the combination of all arms maximizes the final reward. This problem can find immediate applications in E-commerce, healthcare, etc.
Score: 34.96188300126833
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Contextual multi-armed bandit has shown to be an effective tool in recommender systems. In this paper, we study a novel problem of multi-facet bandits involving a group of bandits, each characterizing the users' needs from one unique aspect. In each round, for the given user, we need to select one arm from each bandit, such that the combination of all arms maximizes the final reward. This problem can find immediate applications in E-commerce, healthcare, etc. To address this problem, we propose a novel algorithm, named MuFasa, which utilizes an assembled neural network to jointly learn the underlying reward functions of multiple bandits. It estimates an Upper Confidence Bound (UCB) linked with the expected reward to balance between exploitation and exploration. Under mild assumptions, we provide the regret analysis of MuFasa. It can achieve the near-optimal $\widetilde{ \mathcal{O}}((K+1)\sqrt{T})$ regret bound where $K$ is the number of bandits and $T$ is the number of played rounds. Furthermore, we conduct extensive experiments to show that MuFasa outperforms strong baselines on real-world data sets.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Neural Combinatorial Clustered Bandits for Recommendation Systems [12.800116749927266]
We use deep neural networks to estimate unknown reward functions. Unlike prior neural bandit works, NeUClust uses a neural network to estimate the super arm reward and select the super arm. NeUClust achieves better regret and reward than other contextual matrix and neural bandit algorithms.
arXiv Detail & Related papers (2024-10-18T16:37:28Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback. We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z)
Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z)
Dueling Bandits: From Two-dueling to Multi-dueling [40.4378338001229]
We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms. This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options.
arXiv Detail & Related papers (2022-11-16T22:00:54Z)
Contextual Combinatorial Multi-output GP Bandits with Group Constraints [11.317136648551537]
In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal. We consider a contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen. We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatored Upper Confidence Bounds (TCGP-UCB), which balances between maximizing cumulative super arm reward and satisfying
arXiv Detail & Related papers (2021-11-29T18:39:09Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy. We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)
Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB) We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid. A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.