Related papers: Cascading Bandits Robust to Adversarial Corruptions

Cascading Bandits Robust to Adversarial Corruptions

URL: http://arxiv.org/abs/2502.08077v1
Date: Wed, 12 Feb 2025 02:44:41 GMT
Title: Cascading Bandits Robust to Adversarial Corruptions
Authors: Jize Xie, Cheng Chen, Zhiyong Wang, Shuai Li,
Abstract summary: We study how to resist adversarial corruptions in cascading bandits.<n>We show that both algorithms can achieve logarithmic regret when the algorithm is not under attack.
Score: 15.150320217724571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online learning to rank sequentially recommends a small list of items to users from a large candidate set and receives the users' click feedback. In many real-world scenarios, users browse the recommended list in order and click the first attractive item without checking the rest. Such behaviors are usually formulated as the cascade model. Many recent works study algorithms for cascading bandits, an online learning to rank framework in the cascade model. However, the performance of existing methods may drop significantly if part of the user feedback is adversarially corrupted (e.g., click fraud). In this work, we study how to resist adversarial corruptions in cascading bandits. We first formulate the ``\textit{Cascading Bandits with Adversarial Corruptions}" (CBAC) problem, which assumes that there is an adaptive adversary that may manipulate the user feedback. Then we propose two robust algorithms for this problem, which assume the corruption level is known and agnostic, respectively. We show that both algorithms can achieve logarithmic regret when the algorithm is not under attack, and the regret increases linearly with the corruption level. The experimental results also verify the robustness of our methods.

Related papers

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information [57.287431079644705]
We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. We provide learning algorithms for the leader which achieve $O(T1/2)$ regret under bandit feedback.
arXiv Detail & Related papers (2025-01-31T22:40:57Z)
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling [35.54611331654877]
This paper studies Learning from Imperfect Human Feedback (LIHF), addressing the potential irrationality or imperfect perception when learning from comparative human feedback. Building on evidences that human's imperfection decays over time (i.e., humans learn to improve), we cast this problem as a concave-utility continuous-action dueling bandit but under a restricted form of corruption. We show how this framework can be easily applied to obtain corruption-robust guarantees for other gradient-based dueling bandit algorithms.
arXiv Detail & Related papers (2024-05-18T07:18:43Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Online Corrupted User Detection and Regret Minimization [49.536254494829436]
In real-world online web systems, multiple users usually arrive sequentially into the system. We present an important online learning problem named LOCUD to learn and utilize unknown user relations from disrupted behaviors. We devise a novel online detection algorithm OCCUD based on RCLUB-WCU's inferred user relations.
arXiv Detail & Related papers (2023-10-07T10:20:26Z)
Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z)
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption. We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z)
A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning [43.279169081740726]
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system. We propose new algorithms which, compared to the existing results, achieve strictly better regret bounds in terms of total corruptions. Our results follow from a general algorithmic framework that combines corruption-robust policy elimination meta-algorithms, and plug-in reward-free exploration sub-algorithms.
arXiv Detail & Related papers (2021-02-13T07:04:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.