Dynamic Regret Analysis of Safe Distributed Online Optimization for
Convex and Non-convex Problems
- URL: http://arxiv.org/abs/2302.12320v1
- Date: Thu, 23 Feb 2023 20:34:01 GMT
- Title: Dynamic Regret Analysis of Safe Distributed Online Optimization for
Convex and Non-convex Problems
- Authors: Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour
- Abstract summary: This paper addresses safe distributed online optimization over an unknown set of linear safety constraints.
All agents estimate parameters collaboratively to build estimated feasible sets, ensuring the action selection safety during the optimization phase.
- Score: 14.1081872409308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses safe distributed online optimization over an unknown set
of linear safety constraints. A network of agents aims at jointly minimizing a
global, time-varying function, which is only partially observable to each
individual agent. Therefore, agents must engage in local communications to
generate a safe sequence of actions competitive with the best minimizer
sequence in hindsight, and the gap between the two sequences is quantified via
dynamic regret. We propose distributed safe online gradient descent
(D-Safe-OGD) with an exploration phase, where all agents estimate the
constraint parameters collaboratively to build estimated feasible sets,
ensuring the action selection safety during the optimization phase. We prove
that for convex functions, D-Safe-OGD achieves a dynamic regret bound of
$O(T^{2/3} \sqrt{\log T} + T^{1/3}C_T^*)$, where $C_T^*$ denotes the
path-length of the best minimizer sequence. We further prove a dynamic regret
bound of $O(T^{2/3} \sqrt{\log T} + T^{2/3}C_T^*)$ for certain non-convex
problems, which establishes the first dynamic regret bound for a safe
distributed algorithm in the non-convex setting.
Related papers
- Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z) - Multi-Agent Stage-wise Conservative Linear Bandits [2.2557806157585834]
We study the linear bandit problem in a multi-agent networked setting.<n>Agents must satisfy stage-wise conservative constraints.<n>We propose MA-SCLUCB, an episodic algorithm alternating between action selection and consensus-building phases.
arXiv Detail & Related papers (2025-10-01T07:29:18Z) - Constrained Linear Thompson Sampling [39.724313550777715]
We study the safe linear bandit problem, where an agent sequentially selects actions from a convex domain to maximize an unknown objective.
Existing approaches rely on optimism-based methods with frequentist confidence bounds, often leading to computationally expensive action selection routines.
We propose COnstrained Linear Thompson Sampling (COLTS), a sampling-based framework that efficiently balances regret minimization and constraint satisfaction.
arXiv Detail & Related papers (2025-03-03T20:44:58Z) - Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence
and Variance Reduction [26.9632099249085]
We propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings.
We equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $smashwidetildemathcalO(n+1/epsilon)$ gradient evaluations.
arXiv Detail & Related papers (2023-08-11T10:17:29Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Safe Linear Bandits over Unknown Polytopes [39.177982674455784]
The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown roundwise constraints.
We study the tradeoffs between efficacy and smooth safety costs of SLBs over polytopes.
arXiv Detail & Related papers (2022-09-27T21:13:32Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Decentralized Safe Multi-agent Stochastic Optimal Control using Deep
FBSDEs and ADMM [16.312625634442092]
We propose a novel safe and scalable decentralized solution for multi-agent control in the presence of disturbances.
Decentralization is achieved by augmenting to each agent's optimization variables, copy variables, for its neighbors.
To enable safe consensus solutions, we incorporate an ADMM-based approach.
arXiv Detail & Related papers (2022-02-22T03:57:23Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Safe Online Convex Optimization with Unknown Linear Safety Constraints [0.0]
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
The parameters that specify the linear safety constraints are unknown to the algorithm.
We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
arXiv Detail & Related papers (2021-11-14T19:49:19Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.