Improving Offline Contextual Bandits with Distributional Robustness
- URL: http://arxiv.org/abs/2011.06835v1
- Date: Fri, 13 Nov 2020 09:52:16 GMT
- Title: Improving Offline Contextual Bandits with Distributional Robustness
- Authors: Otmane Sakhi, Louis Faury, Flavian Vasile
- Abstract summary: We introduce a convex reformulation of the Counterfactual Risk Minimization principle.
Our approach is compatible with convex programs, and can therefore be readily adapted to the large data regime.
We present preliminary empirical results supporting the effectiveness of our approach.
- Score: 10.310819665706294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper extends the Distributionally Robust Optimization (DRO) approach
for offline contextual bandits. Specifically, we leverage this framework to
introduce a convex reformulation of the Counterfactual Risk Minimization
principle. Besides relying on convex programs, our approach is compatible with
stochastic optimization, and can therefore be readily adapted tothe large data
regime. Our approach relies on the construction of asymptotic confidence
intervals for offline contextual bandits through the DRO framework. By
leveraging known asymptotic results of robust estimators, we also show how to
automatically calibrate such confidence intervals, which in turn removes the
burden of hyper-parameter selection for policy optimization. We present
preliminary empirical results supporting the effectiveness of our approach.
Related papers
- Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls [8.720733751119994]
Adversarially robust optimization (ARO) has become the de facto standard for training models to defend against adversarial attacks during testing.
Despite their robustness, these models often suffer from severe overfitting.
We propose two approaches to replace the empirical distribution in training with: (i) a worst-case distribution within an ambiguity set; or (ii) a mixture of the empirical distribution with one derived from an auxiliary dataset.
arXiv Detail & Related papers (2024-07-18T15:59:37Z) - Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients [20.941908494137806]
This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery.
Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness.
We use transformers in conjunction with breadth-first-search to improve the learning performance.
arXiv Detail & Related papers (2024-06-10T19:29:10Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Federated Distributionally Robust Optimization for Phase Configuration
of RISs [106.4688072667105]
We study the problem of robust reconfigurable intelligent surface (RIS)-aided downlink communication over heterogeneous RIS types in a supervised learning setting.
By modeling downlink communication over heterogeneous RIS designs as different workers that learn how to optimize phase configurations in a distributed manner, we solve this distributed learning problem.
Our proposed algorithm requires fewer communication rounds to achieve the same worst-case distribution test accuracy compared to competitive baselines.
arXiv Detail & Related papers (2021-08-20T07:07:45Z) - RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by
Backpropagation [12.600828753197204]
We introduce Risk-Aware Planning using PyTorch (RAP), a novel framework for risk-sensitive planning through end-to-end optimization of the entropic utility objective.
We evaluate and compare these two forms of RAPTOR on three highly do-mains, including nonlinear navigation, HVAC control, and linear reservoir control.
arXiv Detail & Related papers (2021-06-14T09:27:19Z) - Risk-Averse Bayes-Adaptive Reinforcement Learning [3.5289688061934963]
We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs)
We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherentity of MDPs.
Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
arXiv Detail & Related papers (2021-02-10T22:34:33Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.