The Combinatorial Multi-Bandit Problem and its Application to Energy
Management
- URL: http://arxiv.org/abs/2010.16269v3
- Date: Wed, 4 Nov 2020 11:07:55 GMT
- Title: The Combinatorial Multi-Bandit Problem and its Application to Energy
Management
- Authors: Tobias Jacobs, Mischa Schmidt, S\'ebastien Nicolas, Anett Sch\"ulke
- Abstract summary: We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management.
For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming.
- Score: 2.236663830879273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a Combinatorial Multi-Bandit Problem motivated by applications in
energy systems management. Given multiple probabilistic multi-arm bandits with
unknown outcome distributions, the task is to optimize the value of a
combinatorial objective function mapping the vector of individual bandit
outcomes to a single scalar reward. Unlike in single-bandit problems with
multi-dimensional action space, the outcomes of the individual bandits are
observable in our setting and the objective function is known. Guided by the
hypothesis that individual observability enables better trade-offs between
exploration and exploitation, we generalize the lower regret bound for single
bandits, showing that indeed for multiple bandits it admits parallelized
exploration. For our energy management application we propose a range of
algorithms that combine exploration principles for multi-arm bandits with
mathematical programming. In an experimental study we demonstrate the
effectiveness of our approach to learn action assignments for 150 bandits, each
having 24 actions, within a horizon of 365 episodes.
Related papers
- QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits [5.530212768657544]
We study the cooperative $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action.
We provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting.
arXiv Detail & Related papers (2024-10-31T12:20:36Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Multi-task Representation Learning for Pure Exploration in Bilinear
Bandits [13.773838574776338]
We study multi-task representation learning for the problem of pure exploration in bilinear bandits.
In bilinear bandits, an action takes the form of a pair of arms from two different entity types.
arXiv Detail & Related papers (2023-11-01T06:30:45Z) - Energy Regularized RNNs for Solving Non-Stationary Bandit Problems [97.72614340294547]
We present an energy term that prevents the neural network from becoming too confident in support of a certain action.
We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
arXiv Detail & Related papers (2023-03-12T03:32:43Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Bandit approach to conflict-free multi-agent Q-learning in view of
photonic implementation [0.0]
Previous studies have used quantum interference of photons to solve the competitive multi-armed bandit problem.
This study extends the conventional approach to a more general multi-agent reinforcement learning.
A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm.
arXiv Detail & Related papers (2022-12-20T00:27:29Z) - Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models [7.458639397686894]
How to explore efficiently is a central problem in multi-armed bandits.
We introduce the metadata-based multi-task bandit problem.
We propose to capture task relations through the lens of Bayesian hierarchical models.
arXiv Detail & Related papers (2021-08-13T22:45:05Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.