Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models
- URL: http://arxiv.org/abs/2508.02812v1
- Date: Mon, 04 Aug 2025 18:29:29 GMT
- Title: Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models
- Authors: Katherine Avery, Chinmay Pendse, David Jensen,
- Abstract summary: Current methods for distributionally robust evaluation and learning create overly conservative evaluations and policies.<n>We propose a practical bandit evaluation and learning algorithm that tailors the uncertainty set to specific problems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributionally robust evaluation estimates the worst-case expected return over an uncertainty set of possible covariate and reward distributions, and distributionally robust learning finds a policy that maximizes that worst-case return across that uncertainty set. Unfortunately, current methods for distributionally robust evaluation and learning create overly conservative evaluations and policies. In this work, we propose a practical bandit evaluation and learning algorithm that tailors the uncertainty set to specific problems using mathematical programs constrained by structural equation models. Further, we show how conditional independence testing can be used to detect shifted variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations and learns lower-variance policies than traditional approaches, particularly for large shifts. Further, the SEM approach learns an optimal policy, assuming the model is sufficiently well-specified.
Related papers
- A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics [1.0923877073891446]
This paper develops a model-based framework for continuous-time policy evaluation.<n>It incorporates both Brownian and L'evy noise to model dynamics influenced by rare and extreme events.
arXiv Detail & Related papers (2025-04-02T08:37:14Z) - Rethinking Robustness in Machine Learning: A Posterior Agreement Approach [45.284633306624634]
Posterior Agreement (PA) theory of model validation provides a principled framework for robustness evaluation.<n>We show that the PA metric provides a sensible and consistent analysis of the vulnerabilities in learning algorithms, even in the presence of few observations.
arXiv Detail & Related papers (2025-03-20T16:03:39Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Optimal Learning via Moderate Deviations Theory [3.9725974446759342]
We develop a systematic construction of highly accurate confidence intervals by using a moderate deviation principle-based approach.<n>It is shown that the proposed confidence intervals are statistically optimal in the sense that they satisfy criteria regarding exponential accuracy, minimality, consistency, mischaracterization probability, and eventual uniformly most accurate (UMA) property.
arXiv Detail & Related papers (2023-05-23T19:57:57Z) - Risk-Sensitive Reinforcement Learning with Exponential Criteria [0.0]
We provide a definition of robust reinforcement learning policies and formulate a risk-sensitive reinforcement learning problem to approximate them.<n>We introduce a novel online Actor-Critic algorithm based on solving a multiplicative Bellman equation using approximation updates.<n>The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
arXiv Detail & Related papers (2022-12-18T04:44:38Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.