Related papers: Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

URL: http://arxiv.org/abs/2409.15680v1
Date: Tue, 24 Sep 2024 02:37:33 GMT
Title: Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret
Authors: Youqing Hua, Shuai Liu, Yiguang Hong, Karl Henrik Johansson, Guangchen Wang,
Abstract summary: This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph. Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player. The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
Score: 10.700891331004799
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining $\mathcal{O}(1)$ sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.

Related papers

Second Order Methods for Bandit Optimization and Control [34.51425758864638]
We show that our algorithm achieves optimal (in terms of terms of convex functions that we call $kappa$-2020) regret bounds for a large class of convex functions. We also investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory.
arXiv Detail & Related papers (2024-02-14T04:03:38Z)
On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization [17.412117389855222]
We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action. We show that both the dynamic regret and the constraint violation is order-wise bounded by the it path-length, the sum of the distances between the consecutive optimal actions.
arXiv Detail & Related papers (2023-01-24T04:22:13Z)
Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates. We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor. Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
Persistent Reductions in Regularized Loss Minimization for Variable Selection [3.3504365823045035]
We show that for a broad class of loss functions it is possible to efficiently identify a subset which are guaranteed to have zero coefficients. We employ an existing ray algorithm for extreme ray identification and make our guarantee algorithm applicable in ultra-high dimensional problems.
arXiv Detail & Related papers (2020-11-30T04:59:44Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.