Sequential Counterfactual Risk Minimization
- URL: http://arxiv.org/abs/2302.12120v2
- Date: Thu, 25 May 2023 10:41:42 GMT
- Title: Sequential Counterfactual Risk Minimization
- Authors: Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal,
Pierre Gaillard
- Abstract summary: "Sequential Counterfactual Risk Minimization" is a framework for dealing with the logged bandit feedback problem.
We introduce a novel counterfactual estimator and identify conditions that can improve the performance of CRM.
- Score: 37.600857571957754
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Counterfactual Risk Minimization (CRM) is a framework for dealing with the
logged bandit feedback problem, where the goal is to improve a logging policy
using offline data. In this paper, we explore the case where it is possible to
deploy learned policies multiple times and acquire new data. We extend the CRM
principle and its theory to this scenario, which we call "Sequential
Counterfactual Risk Minimization (SCRM)." We introduce a novel counterfactual
estimator and identify conditions that can improve the performance of CRM in
terms of excess risk and regret rates, by using an analysis similar to restart
strategies in accelerated optimization methods. We also provide an empirical
evaluation of our method in both discrete and continuous action settings, and
demonstrate the benefits of multiple deployments of CRM.
Related papers
- Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk [23.63388546004777]
We analyze the robustness of CVaR-based risk-sensitive RL under Robust Markov Decision Processes.
Motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets.
arXiv Detail & Related papers (2024-05-02T20:28:49Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Frustratingly Easy Model Generalization by Dummy Risk Minimization [38.67678021055096]
Dummy Risk Minimization (DuRM) is a frustratingly easy and general technique to improve the generalization of Empirical risk minimization (ERM)
We show that DuRM could consistently improve the performance under all tasks with an almost free lunch manner.
arXiv Detail & Related papers (2023-08-04T12:43:54Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - What Is Missing in IRM Training and Evaluation? Challenges and Solutions [41.56612265456626]
Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions.
Recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice.
We identify and resolve three practical limitations in IRM training and evaluation.
arXiv Detail & Related papers (2023-03-04T07:06:24Z) - A State-Augmented Approach for Learning Optimal Resource Management
Decisions in Wireless Networks [58.720142291102135]
We consider a radio resource management (RRM) problem in a multi-user wireless network.
The goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users.
We propose a state-augmented parameterization for the RRM policy, where alongside the instantaneous network states, the RRM policy takes as input the set of dual variables corresponding to the constraints.
arXiv Detail & Related papers (2022-10-28T21:24:13Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Counterfactual Learning of Stochastic Policies with Continuous Actions:
from Models to Offline Evaluation [41.21447375318793]
We introduce a modelling strategy based on a joint kernel embedding of contexts and actions.
We empirically show that the optimization aspect of counterfactual learning is important.
We propose an evaluation protocol for offline policies in real-world logged systems.
arXiv Detail & Related papers (2020-04-22T07:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.