Online Learning with Unknown Constraints
- URL: http://arxiv.org/abs/2403.04033v1
- Date: Wed, 6 Mar 2024 20:23:59 GMT
- Title: Online Learning with Unknown Constraints
- Authors: Karthik Sridharan and Seung Won Wilson Yoo
- Abstract summary: We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round.
The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round.
- Score: 10.263431543520452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of online learning where the sequence of actions
played by the learner must adhere to an unknown safety constraint at every
round. The goal is to minimize regret with respect to the best safe action in
hindsight while simultaneously satisfying the safety constraint with high
probability on each round. We provide a general meta-algorithm that leverages
an online regression oracle to estimate the unknown safety constraint, and
converts the predictions of an online learning oracle to predictions that
adhere to the unknown safety constraint. On the theoretical side, our
algorithm's regret can be bounded by the regret of the online regression and
online learning oracles, the eluder dimension of the model class containing the
unknown safety constraint, and a novel complexity measure that captures the
difficulty of safe learning. We complement our result with an asymptotic lower
bound that shows that the aforementioned complexity measure is necessary. When
the constraints are linear, we instantiate our result to provide a concrete
algorithm with $\sqrt{T}$ regret using a scaling transformation that balances
optimistic exploration with pessimistic constraint satisfaction.
Related papers
- LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - State-Wise Safe Reinforcement Learning With Pixel Observations [12.338614299403305]
We propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions.
As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations.
We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return.
arXiv Detail & Related papers (2023-11-03T20:32:30Z) - High-Probability Risk Bounds via Sequential Predictors [20.741036493022442]
We show that online to batch conversions applied to general online learning algorithms can bypass regret bounds.
We obtain nearly optimal high-probability risk bounds for several classical statistical estimation problems.
Our analysis relies on the fact that many online learning algorithms are improper, as they are not restricted to use predictors from a given reference class.
arXiv Detail & Related papers (2023-08-15T06:19:31Z) - Optimal Learners for Realizable Regression: PAC Learning and Online Learning [52.37726841759983]
We aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting.
We first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable.
In the context of online learning we provide a dimension that characterizes the minimax optimal instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression.
arXiv Detail & Related papers (2023-07-07T21:39:25Z) - Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making [73.48977854003697]
This work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space.
We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions.
arXiv Detail & Related papers (2023-02-10T18:45:52Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Active Learning with Safety Constraints [25.258564629480063]
We investigate the complexity of learning the best safe decision in interactive environments.
We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal.
arXiv Detail & Related papers (2022-06-22T15:45:38Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Safe Online Convex Optimization with Unknown Linear Safety Constraints [0.0]
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
The parameters that specify the linear safety constraints are unknown to the algorithm.
We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
arXiv Detail & Related papers (2021-11-14T19:49:19Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.