Related papers: PAC Learnability of Scenario Decision-Making Algorithms: Necessary and Sufficient Conditions

PAC Learnability of Scenario Decision-Making Algorithms: Necessary and Sufficient Conditions

URL: http://arxiv.org/abs/2501.08887v1
Date: Wed, 15 Jan 2025 15:57:13 GMT
Title: PAC Learnability of Scenario Decision-Making Algorithms: Necessary and Sufficient Conditions
Authors: Guillaume O. Berger, Raphaël M. Jungers,
Abstract summary: PAC is the ability to make a decision that has an arbitrarily low risk of violating an unknown safety constraint.<n>Sufficient conditions for scenario decision-making algorithms to be PAC are available in the literature.<n>We derive a necessary condition for scenario decision-making algorithms to be PAC, inspired by the VC dimension and the so-called no-free-lunch theorem.
Score: 0.7673339435080445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the PAC property of scenario decision-making algorithms, that is, the ability to make a decision that has an arbitrarily low risk of violating an unknown safety constraint, provided sufficiently many realizations (called scenarios) of the safety constraint are sampled. Sufficient conditions for scenario decision-making algorithms to be PAC are available in the literature, such as finiteness of the VC dimension of its associated classifier and existence of a compression scheme. We study the question of whether these sufficient conditions are also necessary. We show with counterexamples that this is not the case in general. This contrasts with binary classification learning, for which the analogous conditions are sufficient and necessary. Popular scenario decision-making algorithms, such as scenario optimization, enjoy additional properties, such as stability and consistency. We show that even under these additional assumptions the above conclusions hold. Finally, we derive a necessary condition for scenario decision-making algorithms to be PAC, inspired by the VC dimension and the so-called no-free-lunch theorem.

Related papers

Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z)
Constrained Online Decision-Making: A Unified Framework [14.465944215100746]
We investigate a general formulation of sequential decision-making with stage-wise feasibility constraints.<n>We propose a unified algorithmic framework that captures many existing constrained learning problems.<n>Our result offers a principled foundation for constrained sequential decision-making in both theory and practice.
arXiv Detail & Related papers (2025-05-11T19:22:04Z)
Improved Compression Bounds for Scenario Decision Making [0.7673339435080445]
We show how to make a decision in an uncertain environment by drawing samples of the uncertainty and making a decision based on the samples, called "scenarios"<n>Probability guarantees take the form of a bound on the probability of sampling a set of scenarios that will lead to a decision whose risk of failure is above a given maximum tolerance.<n>We propose new bounds that improve upon the existing ones without requiring stronger assumptions on the problem.
arXiv Detail & Related papers (2025-01-15T15:53:34Z)
Flipping-based Policy for Chance-Constrained Markov Decision Processes [9.404184937255694]
This paper proposes a textitflipping-based policy for Chance-Constrained Markov Decision Processes ( CCMDPs) The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. We demonstrate that the flipping-based policy can improve the performance of the existing safe RL algorithms under the same limits of safety constraints.
arXiv Detail & Related papers (2024-10-09T02:00:39Z)
Algorithms for learning value-aligned policies considering admissibility relaxation [1.8336820954218835]
The emerging field of emphvalue awareness engineering claims that software agents and systems should be value-aware. In this paper, we present two algorithms, $epsilontext-ADQL$ for strategies based on local alignment and its extension $epsilontext-CADQL$ for a sequence of decisions. We have validated their efficiency in a water distribution problem in a drought scenario.
arXiv Detail & Related papers (2024-06-07T11:10:07Z)
Learning Adversarial MDPs with Stochastic Hard Constraints [37.24692425018]
We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and hard constraints. Our work is the first to study CMDPs involving both adversarial losses and hard constraints.
arXiv Detail & Related papers (2024-03-06T12:49:08Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL) We show an analogous result for vanilla Q-functions under a soft margin condition. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information. It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z)
Stochastic Conservative Contextual Linear Bandits [8.684768561839146]
We study the problem of safe real-time decision making under uncertainty. We formulate a conservative contextual bandit formulation for real-time decision making.
arXiv Detail & Related papers (2022-03-29T14:50:50Z)
Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty [77.18483084440182]
Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. We present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs.
arXiv Detail & Related papers (2021-06-13T23:08:49Z)
Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR) We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.