Related papers: Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach

Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach

URL: http://arxiv.org/abs/2511.03074v1
Date: Tue, 04 Nov 2025 23:39:37 GMT
Title: Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
Authors: Fatemeh Ghaffari, Siddarth Sitaraman, Xutong Liu, Xuchuang Wang, Mohammad Hajiesmaili,
Abstract summary: Online learning to rank studies how to recommend a short ranked list of items from a large pool and improves future rankings based on user clicks.<n>This setting is commonly modeled as cascading bandits, where the objective is to maximize the likelihood that the user clicks on at least one of the presented items.<n>We propose MSUCB, a robust algorithm that incorporates a novel mean-of-medians estimator, which is applied to bandits with corruption setting.
Score: 15.847551488328866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online learning to rank (OLTR) studies how to recommend a short ranked list of items from a large pool and improves future rankings based on user clicks. This setting is commonly modeled as cascading bandits, where the objective is to maximize the likelihood that the user clicks on at least one of the presented items across as many timesteps as possible. However, such systems are vulnerable to click fraud and other manipulations (i.e., corruption), where bots or paid click farms inject corrupted feedback that misleads the learning process and degrades user experience. In this paper, we propose MSUCB, a robust algorithm that incorporates a novel mean-of-medians estimator, which to our knowledge is applied to bandits with corruption setting for the first time. This estimator behaves like a standard mean in the absence of corruption, so no cost is paid for robustness. Under corruption, the median step filters out outliers and corrupted samples, keeping the estimate close to its true value. Updating this estimate at every round further accelerates empirical convergence in experiments. Hence, MSUCB achieves optimal logarithmic regret in the absence of corruption and degrades gracefully under corruptions, with regret increasing only by an additive term tied to the total corruption. Comprehensive and extensive experiments on real-world datasets further demonstrate that our approach consistently outperforms prior methods while maintaining strong robustness. In particular, it achieves a $97.35\%$ and a $91.60\%$ regret improvement over two state-of-the-art methods.

Related papers

Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits [2.1772197319352498]
We study cooperative multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification.<n>We show that a fixed environment-side budget $$ can translate into an effective corruption level ranging from $$ to $N$.<n>We further establish information-theoretic limits, including an unavoidable additive $()$ penalty and a high-corruption regime $=(NT)$ where sublinear regret is impossible without clean information.
arXiv Detail & Related papers (2026-01-17T06:13:52Z)
Cascading Bandits Robust to Adversarial Corruptions [15.150320217724571]
We study how to resist adversarial corruptions in cascading bandits.<n>We show that both algorithms can achieve logarithmic regret when the algorithm is not under attack.
arXiv Detail & Related papers (2025-02-12T02:44:41Z)
Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification [17.288347876319126]
In linear bandits, how can a learner effectively learn when facing corrupted rewards? We compare two types of corruptions commonly considered: strong corruption, where the corruption level depends on the action chosen by the learner, and weak corruption, where the corruption level does not depend on the action chosen by the learner. For linear bandits, we fully characterize the gap between the minimax regret under strong and weak corruptions.
arXiv Detail & Related papers (2024-10-10T02:01:46Z)
Enhancing Infrared Small Target Detection Robustness with Bi-Level Adversarial Framework [61.34862133870934]
We propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions. Our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark.
arXiv Detail & Related papers (2023-09-03T06:35:07Z)
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit. We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z)
Soft Diffusion: Score Matching for General Corruptions [84.26037497404195]
We propose a new objective called Soft Score Matching that provably learns the score function for any linear corruption process. We show that our objective learns the gradient of the likelihood under suitable regularity conditions for the family of corruption processes. Our method achieves state-of-the-art FID score $1.85$ on CelebA-64, outperforming all previous linear diffusion models.
arXiv Detail & Related papers (2022-09-12T17:45:03Z)
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption. We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
DeepMoM: Robust Deep Learning With Median-of-Means [0.3553493344868413]
We introduce an approach motivated by very recent insights into median-of-means and Le Cam's principle. We show that the approach can be readily implemented, and we demonstrate that it performs very well in practice.
arXiv Detail & Related papers (2021-05-28T18:07:32Z)
On the effectiveness of adversarial training against common corruptions [29.596070201105277]
We show that adversarial training can serve as a strong baseline against common corruptions. We show that our approach does not only improve the $ell_p$ adversarial training baseline but also has cumulative gains with data augmentation methods.
arXiv Detail & Related papers (2021-03-03T11:04:09Z)
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning [43.279169081740726]
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system. We propose new algorithms which, compared to the existing results, achieve strictly better regret bounds in terms of total corruptions. Our results follow from a general algorithmic framework that combines corruption-robust policy elimination meta-algorithms, and plug-in reward-free exploration sub-algorithms.
arXiv Detail & Related papers (2021-02-13T07:04:23Z)
Corruption-robust exploration in episodic reinforcement learning [76.19192549843727]
We study multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system. Our framework yields efficient algorithms which attain near-optimal regret in the absence of corruptions. Notably, our work provides the first sublinear regret guarantee which any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning.
arXiv Detail & Related papers (2019-11-20T03:49:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.