Robust Constrained Reinforcement Learning
- URL: http://arxiv.org/abs/2209.06866v1
- Date: Wed, 14 Sep 2022 18:29:02 GMT
- Title: Robust Constrained Reinforcement Learning
- Authors: Yue Wang, Fei Miao, Shaofeng Zou
- Abstract summary: Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs.
We propose a framework of robust constrained reinforcement learning under model uncertainty.
The goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set.
- Score: 21.316736188238806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constrained reinforcement learning is to maximize the expected reward subject
to constraints on utilities/costs. However, the training environment may not be
the same as the test one, due to, e.g., modeling error, adversarial attack,
non-stationarity, resulting in severe performance degradation and more
importantly constraint violation. We propose a framework of robust constrained
reinforcement learning under model uncertainty, where the MDP is not fixed but
lies in some uncertainty set, the goal is to guarantee that constraints on
utilities/costs are satisfied for all MDPs in the uncertainty set, and to
maximize the worst-case reward performance over the uncertainty set. We design
a robust primal-dual approach, and further theoretically develop guarantee on
its convergence, complexity and robust feasibility. We then investigate a
concrete example of $\delta$-contamination uncertainty set, design an online
and model-free algorithm and theoretically characterize its sample complexity.
Related papers
- Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning [49.28548464288051]
Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text.<n>In intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness.<n>This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations.
arXiv Detail & Related papers (2026-01-16T16:05:49Z) - Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Conformal Mixed-Integer Constraint Learning with Feasibility Guarantees [0.3058340744328236]
Conformal Mixed-Integer Constraint Learning provides probabilistic feasibility guarantees for data-driven constraints in optimization problems.<n>We show that C-MICL consistently achieves target rates, maintains competitive objective performance, and significantly reduces computational cost compared to existing methods.
arXiv Detail & Related papers (2025-06-04T03:26:31Z) - Enforcing Hard Linear Constraints in Deep Learning Models with Decision Rules [8.098452803458253]
This paper proposes a model-agnostic framework for enforcing input-dependent linear equality and inequality constraints on neural network outputs.<n>The architecture combines a task network trained for prediction accuracy with a safe network trained using decision rules from the runtime and robust optimization to ensure feasibility across the entire input space.
arXiv Detail & Related papers (2025-05-20T03:09:44Z) - SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)
We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.
Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z) - From Data to Uncertainty Sets: a Machine Learning Approach [5.877778007271621]
We leverage robust optimization to protect a constraint against the uncertainty of a machine learning model's output.
We derive strong guarantees on the probability of violation.
On synthetic computational experiments, our method requires uncertainty sets with radii up to one order of magnitude smaller than those of other approaches.
arXiv Detail & Related papers (2025-03-04T01:30:28Z) - Uncertainty separation via ensemble quantile regression [23.667247644930708]
This paper introduces a novel and scalable framework for uncertainty estimation and separation.
Our framework is scalable to large datasets and demonstrates superior performance on synthetic benchmarks.
arXiv Detail & Related papers (2024-12-18T11:15:32Z) - Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.
We propose methods tailored to the unique properties of perception and decision-making.
We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling [18.93897922183304]
We focus on the task of conditional image generation, where an image is synthesized according to user instructions.
We propose an uncertainty-aware reward modeling, called Ctrl-U, designed to reduce the adverse effects of imprecise feedback from the reward model.
arXiv Detail & Related papers (2024-10-15T03:43:51Z) - End-to-End Conformal Calibration for Optimization Under Uncertainty [32.844953018302874]
This paper develops an end-to-end framework to learn the uncertainty estimates for conditional optimization.
In addition, we propose to represent arbitrary convex uncertainty sets with partially convex neural networks.
Our approach consistently improves upon two-stage-then-optimize.
arXiv Detail & Related papers (2024-09-30T17:38:27Z) - Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples.
Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Risk-Averse Model Uncertainty for Distributionally Robust Safe
Reinforcement Learning [3.9821399546174825]
We introduce a deep reinforcement learning framework for safe decision making in uncertain environments.
We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems.
In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
arXiv Detail & Related papers (2023-01-30T00:37:06Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Recursive Constraints to Prevent Instability in Constrained
Reinforcement Learning [16.019477271828745]
We consider the challenge of finding a deterministic policy for a Markov decision process.
This class of problem is known to be hard, but the combined requirements of determinism and uniform optimality can create learning instability.
We present a suitable constrained reinforcement learning algorithm that prevents learning instability.
arXiv Detail & Related papers (2022-01-20T02:33:24Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.