Related papers: An Efficient Algorithm for Cooperative Semi-Bandits

An Efficient Algorithm for Cooperative Semi-Bandits

URL: http://arxiv.org/abs/2010.01818v2
Date: Tue, 9 Feb 2021 08:55:30 GMT
Title: An Efficient Algorithm for Cooperative Semi-Bandits
Authors: Riccardo Della Vecchia (BIDSA), Tommaso Cesari (TSE)
Abstract summary: We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm. We show that the expected regret of our algorithm after T time steps is of order QT log(k)(k$alpha$ 1 /Q + m), where Q is the total activation probability mass.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of asynchronous online combinatorial optimization on a network of communicating agents. At each time step, some of the agents are stochastically activated, requested to make a prediction, and the system pays the corresponding loss. Then, neighbors of active agents receive semi-bandit feedback and exchange some succinct local information. The goal is to minimize the network regret, defined as the difference between the cumulative loss of the predictions of active agents and that of the best action in hindsight, selected from a combinatorial decision set. The main challenge in such a context is to control the computational complexity of the resulting algorithm while retaining minimax optimal regret guarantees. We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm, that implements a new loss estimation procedure generalizing the Geometric Resampling of Neu and Bart{\'o}k [2013] to our setting. Assuming that the elements of the decision set are k-dimensional binary vectors with at most m non-zero entries and $\alpha$ 1 is the independence number of the network, we show that the expected regret of our algorithm after T time steps is of order Q mkT log(k)(k$\alpha$ 1 /Q + m), where Q is the total activation probability mass. Furthermore, we prove that this is only $\sqrt$ k log k-away from the best achievable rate and that Coop-FTPL has a state-of-the-art T 3/2 worst-case computational complexity.

Related papers

The Iterative Chainlet Partitioning Algorithm for the Traveling Salesman Problem with Drone and Neural Acceleration [19.41080715658528]
Iterative Chainlet Partitioning (ICP) algorithm and its neural acceleration for solving the Traveling Salesman Problem with Drone (TSP-D) ICP outperforms existing algorithms in both solution quality and computational time.
arXiv Detail & Related papers (2025-04-21T14:51:15Z)
Decentralized Multi-Task Online Convex Optimization Under Random Link Failures [5.513958040574729]
We develop a robust decentralized saddle-point algorithm against random link failures with heterogeneous probabilities. We extend our algorithm and analysis to the two-point bandit feedback scenario.
arXiv Detail & Related papers (2024-01-04T00:57:33Z)
Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem. It is first algorithm that tackles row-wise distributed data in sparse linear bandits. It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB) BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z)
Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem. We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.