Multi-point Feedback of Bandit Convex Optimization with Hard Constraints
- URL: http://arxiv.org/abs/2310.10946v1
- Date: Tue, 17 Oct 2023 02:43:22 GMT
- Title: Multi-point Feedback of Bandit Convex Optimization with Hard Constraints
- Authors: Yasunari Hikima
- Abstract summary: We study bandit convex optimization with constraints, where the learner aims to generate a sequence of decisions under partial information of loss functions.
We adopt the cumulative textithard constraint violation as the metric of constraint violation.
Our algorithm attains $O(d2Tmaxc,1-c)$ regret bounds and $O(d2T1-fracc2)$ cumulative hard constraint violation bounds for convex loss functions and time-varying constraints.
- Score: 1.8130068086063336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies bandit convex optimization with constraints, where the
learner aims to generate a sequence of decisions under partial information of
loss functions such that the cumulative loss is reduced as well as the
cumulative constraint violation is simultaneously reduced. We adopt the
cumulative \textit{hard} constraint violation as the metric of constraint
violation, which is defined by $\sum_{t=1}^{T} \max\{g_t(\boldsymbol{x}_t),
0\}$. Owing to the maximum operator, a strictly feasible solution cannot cancel
out the effects of violated constraints compared to the conventional metric
known as \textit{long-term} constraints violation. We present a penalty-based
proximal gradient descent method that attains a sub-linear growth of both
regret and cumulative hard constraint violation, in which the gradient is
estimated with a two-point function evaluation. Precisely, our algorithm
attains $O(d^2T^{\max\{c,1-c\}})$ regret bounds and $O(d^2T^{1-\frac{c}{2}})$
cumulative hard constraint violation bounds for convex loss functions and
time-varying constraints, where $d$ is the dimensionality of the feasible
region and $c\in[\frac{1}{2}, 1)$ is a user-determined parameter. We also
extend the result for the case where the loss functions are strongly convex and
show that both regret and constraint violation bounds can be further reduced.
Related papers
- Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - Distributed Online Convex Optimization with Adversarial Constraints:
Reduced Cumulative Constraint Violation Bounds under Slater's Condition [29.809415829907525]
This paper considers distributed online convex optimization with adversarial constraints.
Agents collaborate to minimize network regret and cumulative constraint violation.
To the best of our knowledge, this paper is the first to achieve reduced (network) cumulative constraint violation bounds for (distributed) online convex optimization with adversarial constraints.
arXiv Detail & Related papers (2023-05-31T19:39:15Z) - Projection-Free Online Convex Optimization with Stochastic Constraints [0.0]
We develop projection-free algorithms for online convex optimization with constraints.
We deduce sublinear regret and constraint violation bounds for various settings.
We prove that the constraint violation can be reduced to have the same growth as the regret.
arXiv Detail & Related papers (2023-05-02T11:27:34Z) - On Dynamic Regret and Constraint Violations in Constrained Online Convex
Optimization [17.412117389855222]
We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action.
We show that both the dynamic regret and the constraint violation is order-wise bounded by the it path-length, the sum of the distances between the consecutive optimal actions.
arXiv Detail & Related papers (2023-01-24T04:22:13Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Regret and Cumulative Constraint Violation Analysis for Online Convex
Optimization with Long Term Constraints [24.97580261894342]
This paper considers online convex optimization with long term constraints, where constraints can be violated in intermediate rounds, but need to be satisfied in the long run.
A novel algorithm is first proposed and it achieves an $mathcalO(Tmaxc,1-c)$ bound for static regret and an $mathcalO(T(1-c)/2)$ bound for cumulative constraint violation.
arXiv Detail & Related papers (2021-06-09T15:18:06Z) - Regret and Cumulative Constraint Violation Analysis for Distributed
Online Constrained Convex Optimization [24.97580261894342]
This paper considers the distributed online convex optimization problem with time-varying constraints over a network of agents.
Two algorithms with full-information and bandit feedback are proposed.
arXiv Detail & Related papers (2021-05-01T18:28:53Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.