Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization
- URL: http://arxiv.org/abs/2305.01522v1
- Date: Wed, 26 Apr 2023 15:54:23 GMT
- Title: Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization
- Authors: Shashank Gupta, Harrie Oosterhuis and Maarten de Rijke
- Abstract summary: We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
- Score: 63.93275508300137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual learning to rank (CLTR) relies on exposure-based inverse
propensity scoring (IPS), a LTR-specific adaptation of IPS to correct for
position bias. While IPS can provide unbiased and consistent estimates, it
often suffers from high variance. Especially when little click data is
available, this variance can cause CLTR to learn sub-optimal ranking behavior.
Consequently, existing CLTR methods bring significant risks with them, as
naively deploying their models can result in very negative user experiences. We
introduce a novel risk-aware CLTR method with theoretical guarantees for safe
deployment. We apply a novel exposure-based concept of risk regularization to
IPS estimation for LTR. Our risk regularization penalizes the mismatch between
the ranking behavior of a learned model and a given safe model. Thereby, it
ensures that learned ranking models stay close to a trusted model, when there
is high uncertainty in IPS estimation, which greatly reduces the risks during
deployment. Our experimental results demonstrate the efficacy of our proposed
method, which is effective at avoiding initial periods of bad performance when
little data is available, while also maintaining high performance at
convergence. For the CLTR field, our novel exposure-based risk minimization
method enables practitioners to adopt CLTR methods in a safer manner that
mitigates many of the risks attached to previous methods.
Related papers
- Proximal Ranking Policy Optimization for Practical Safety in Counterfactual Learning to Rank [64.44255178199846]
We propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior.
PRPO removes incentives for learning ranking behavior that is too dissimilar to a safe ranking model.
Our experiments show that PRPO provides higher performance than the existing safe inverse propensity scoring approach.
arXiv Detail & Related papers (2024-09-15T22:22:27Z) - Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank [64.44255178199846]
We generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR.
We also propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior.
PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.
arXiv Detail & Related papers (2024-07-29T12:23:59Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression [12.44857030152608]
Deep Neural Networks are prone to learning and relying on spurious correlations in the training data, which, for high-risk applications, can have fatal consequences.
Various approaches to suppress model reliance on harmful features have been proposed that can be applied post-hoc without additional training.
We propose a reactive approach conditioned on model-derived knowledge and eXplainable Artificial Intelligence (XAI) insights.
arXiv Detail & Related papers (2024-04-15T09:16:49Z) - Inference-time Stochastic Ranking with Risk Control [19.20938164194589]
Learning to Rank methods are vital in online economies, affecting users and item providers.
We propose a novel method that performs ranking at inference time with guanranteed utility or fairness given pretrained scoring functions.
arXiv Detail & Related papers (2023-06-12T15:44:58Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Risk-Averse Offline Reinforcement Learning [46.383648750385575]
Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration.
We present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting.
arXiv Detail & Related papers (2021-02-10T10:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.