Related papers: Online learning with Corrupted context: Corrupted Contextual Bandits

Online learning with Corrupted context: Corrupted Contextual Bandits

URL: http://arxiv.org/abs/2006.15194v1
Date: Fri, 26 Jun 2020 19:53:26 GMT
Title: Online learning with Corrupted context: Corrupted Contextual Bandits
Authors: Djallel Bouneffouf
Abstract summary: We consider a novel variant of the contextual bandit problem. This problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. We propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism.
Score: 19.675277307158435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting,we propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism. Unlike standard contextual bandit methods, we are able to learn from all iteration, even those with corrupted context, by improving the computing of the expectation for each arm. Promising empirical results are obtained on several real-life datasets.

Related papers

Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set. Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity. For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z)
Neural Contextual Bandits for Personalized Recommendation [49.85090929163639]
This tutorial investigates the contextual bandits as a powerful framework for personalized recommendations. We focus on the exploration perspective of contextual bandits to alleviate the Matthew Effect'' in recommender systems. In addition to the conventional linear contextual bandits, we will also dedicated to neural contextual bandits.
arXiv Detail & Related papers (2023-12-21T17:03:26Z)
Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts. Our algorithm, poLinUCB, achieves tight regret under standard assumptions. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z)
Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context. This setting is motivated by a wide range of applications where the true context for decision-making is unobserved. We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z)
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context. We show that the contextual problem can be solved as a linear bandit problem. Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z)
Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms [74.55200180156906]
The contextual bandit problem models the trade-off between exploration and exploitation. We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
arXiv Detail & Related papers (2021-06-05T22:30:21Z)
Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks [81.13338949407205]
Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks and can fail completely in the presence of attacks. Existing robust bandit algorithms only work for the non-contextual setting under the attack of rewards. We provide the first robust bandit algorithm for linear contextual bandit setting under a fully adaptive and omniscient attack.
arXiv Detail & Related papers (2021-06-05T22:20:34Z)
Greedy Bandits with Sampled Context [0.0]
Greedy Bandits with Sampled Context (GB-SC) is a method for contextual multi-armed bandits to develop the prior from context information. Our results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret.
arXiv Detail & Related papers (2020-07-27T17:17:45Z)
Contextual Bandit with Missing Rewards [27.066965426355257]
We consider a novel variant of the contextual bandit problem where the reward associated with each context-based decision may not always be observed. This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. We propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering.
arXiv Detail & Related papers (2020-07-13T13:29:51Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.