Related papers: The Combinatorial Multi-Bandit Problem and its Application to Energy Management

The Combinatorial Multi-Bandit Problem and its Application to Energy Management

URL: http://arxiv.org/abs/2010.16269v3
Date: Wed, 4 Nov 2020 11:07:55 GMT
Title: The Combinatorial Multi-Bandit Problem and its Application to Energy Management
Authors: Tobias Jacobs, Mischa Schmidt, S\'ebastien Nicolas, Anett Sch\"ulke
Abstract summary: We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management. For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming.
Score: 2.236663830879273
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a Combinatorial Multi-Bandit Problem motivated by applications in energy systems management. Given multiple probabilistic multi-arm bandits with unknown outcome distributions, the task is to optimize the value of a combinatorial objective function mapping the vector of individual bandit outcomes to a single scalar reward. Unlike in single-bandit problems with multi-dimensional action space, the outcomes of the individual bandits are observable in our setting and the objective function is known. Guided by the hypothesis that individual observability enables better trade-offs between exploration and exploitation, we generalize the lower regret bound for single bandits, showing that indeed for multiple bandits it admits parallelized exploration. For our energy management application we propose a range of algorithms that combine exploration principles for multi-arm bandits with mathematical programming. In an experimental study we demonstrate the effectiveness of our approach to learn action assignments for 150 bandits, each having 24 actions, within a horizon of 365 episodes.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits [5.530212768657544]
We study the cooperative $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action. We provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting.
arXiv Detail & Related papers (2024-10-31T12:20:36Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting. We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits [13.773838574776338]
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types.
arXiv Detail & Related papers (2023-11-01T06:30:45Z)
Energy Regularized RNNs for Solving Non-Stationary Bandit Problems [97.72614340294547]
We present an energy term that prevents the neural network from becoming too confident in support of a certain action. We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
arXiv Detail & Related papers (2023-03-12T03:32:43Z)
Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms. Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z)
Bandit approach to conflict-free multi-agent Q-learning in view of photonic implementation [0.0]
Previous studies have used quantum interference of photons to solve the competitive multi-armed bandit problem. This study extends the conventional approach to a more general multi-agent reinforcement learning. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm.
arXiv Detail & Related papers (2022-12-20T00:27:29Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models [7.458639397686894]
How to explore efficiently is a central problem in multi-armed bandits. We introduce the metadata-based multi-task bandit problem. We propose to capture task relations through the lens of Bayesian hierarchical models.
arXiv Detail & Related papers (2021-08-13T22:45:05Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.