An Efficient Algorithm for Cooperative Semi-Bandits
- URL: http://arxiv.org/abs/2010.01818v2
- Date: Tue, 9 Feb 2021 08:55:30 GMT
- Title: An Efficient Algorithm for Cooperative Semi-Bandits
- Authors: Riccardo Della Vecchia (BIDSA), Tommaso Cesari (TSE)
- Abstract summary: We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm.
We show that the expected regret of our algorithm after T time steps is of order QT log(k)(k$alpha$ 1 /Q + m), where Q is the total activation probability mass.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of asynchronous online combinatorial optimization on
a network of communicating agents. At each time step, some of the agents are
stochastically activated, requested to make a prediction, and the system pays
the corresponding loss. Then, neighbors of active agents receive semi-bandit
feedback and exchange some succinct local information. The goal is to minimize
the network regret, defined as the difference between the cumulative loss of
the predictions of active agents and that of the best action in hindsight,
selected from a combinatorial decision set. The main challenge in such a
context is to control the computational complexity of the resulting algorithm
while retaining minimax optimal regret guarantees. We introduce Coop-FTPL, a
cooperative version of the well-known Follow The Perturbed Leader algorithm,
that implements a new loss estimation procedure generalizing the Geometric
Resampling of Neu and Bart{\'o}k [2013] to our setting. Assuming that the
elements of the decision set are k-dimensional binary vectors with at most m
non-zero entries and $\alpha$ 1 is the independence number of the network, we
show that the expected regret of our algorithm after T time steps is of order Q
mkT log(k)(k$\alpha$ 1 /Q + m), where Q is the total activation probability
mass. Furthermore, we prove that this is only $\sqrt$ k log k-away from the
best achievable rate and that Coop-FTPL has a state-of-the-art T 3/2 worst-case
computational complexity.
Related papers
- Decentralized Multi-Task Online Convex Optimization Under Random Link
Failures [5.513958040574729]
We develop a robust decentralized saddle-point algorithm against random link failures with heterogeneous probabilities.
We extend our algorithm and analysis to the two-point bandit feedback scenario.
arXiv Detail & Related papers (2024-01-04T00:57:33Z) - Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem.
It is first algorithm that tackles row-wise distributed data in sparse linear bandits.
It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.