Related papers: Online Learning in Budget-Constrained Dynamic Colonel Blotto Games

Online Learning in Budget-Constrained Dynamic Colonel Blotto Games

URL: http://arxiv.org/abs/2103.12833v4
Date: Mon, 8 May 2023 19:45:41 GMT
Title: Online Learning in Budget-Constrained Dynamic Colonel Blotto Games
Authors: Vincent Leon, S. Rasoul Etesami
Abstract summary: We study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting. We devise an efficient algorithm that combines a special bandit algorithm for path planning problem and a bandits with knapsack algorithm to cope with the budget constraint.
Score: 2.132096006921048
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting and analyze the problem using an online learning approach. In this model, one of the players is a learner who has limited troops to allocate over a finite time horizon, and the other player is an adversary. In each round, the learner plays a one-shot Colonel Blotto game with the adversary and strategically determines the allocation of troops among battlefields based on past observations. The adversary chooses its allocation action randomly from some fixed distribution that is unknown to the learner. The learner's objective is to minimize its regret, which is the difference between the cumulative reward of the best mixed strategy and the realized cumulative reward by following a learning algorithm while not violating the budget constraint. The learning in dynamic CBG is analyzed under the framework of combinatorial bandits and bandits with knapsacks. We first convert the budget-constrained dynamic CBG to a path planning problem on a directed graph. We then devise an efficient algorithm that combines a special combinatorial bandit algorithm for path planning problem and a bandits with knapsack algorithm to cope with the budget constraint. The theoretical analysis shows that the learner's regret is bounded by a term sublinear in time horizon and polynomial in other parameters. Finally, we justify our theoretical results by carrying out simulations for various scenarios.

Related papers

Optimal cross-learning for contextual bandits with unknown context distributions [28.087360479901978]
We consider the problem of designing contextual bandit algorithms in the cross-learning'' setting of Balseiro et al. We provide an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $widetildeO(sqrtTK)$, independent of the number of contexts. At the core of our algorithm is a novel technique for coordinating the execution of an algorithm over multiple epochs.
arXiv Detail & Related papers (2024-01-03T18:02:13Z)
A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite) The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z)
Distributed Task Management in Fog Computing: A Socially Concave Bandit Game [7.708904950194129]
Fog computing leverages the task offloading capabilities at the network's edge to improve efficiency and enable swift responses to application demands. We formulate the distributed task allocation problem as a social-concave game with bandit feedback. We develop two no-regret online decision-making strategies.
arXiv Detail & Related papers (2022-03-28T08:26:14Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Method for making multi-attribute decisions in wargames by combining intuitionistic fuzzy numbers with reinforcement learning [18.04026817707759]
The article proposes an algorithm that combines the multi-attribute management and reinforcement learning methods. It solves the problem of the agent's low rate of winning against specific rules and its inability to quickly converge during intelligent wargame training. It is the first time in this field that an algorithm design for intelligent wargaming combines multi-attribute decision making with reinforcement learning.
arXiv Detail & Related papers (2021-09-06T10:45:52Z)
Online Multiobjective Minimax Optimization and Applications [14.699969822572308]
We introduce a simple but general online learning framework, in which an adaptive adversary introduces a new game. The learner's goal is to play so as to minimize the maximum coordinate of the cumulative vector-valued loss. We give a simple algorithm that can compete with the setting in which the adversary must announce their action first. This includes no regret learning: we can recover optimal algorithms and bounds for minimizing external regret, internal regret, adaptive regret, multigroup regret, subsequence regret, and a notion of regret in the sleeping experts setting.
arXiv Detail & Related papers (2021-08-09T06:52:08Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.