Related papers: Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation

Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation

URL: http://arxiv.org/abs/2302.07695v1
Date: Wed, 15 Feb 2023 14:46:19 GMT
Title: Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation
Authors: Deniz Preil, Michael Krapp
Abstract summary: This paper proposes a new algorithm, that combines the reinforcement learning domain of multi-armed bandits and random search strategies to solve discrete optimization problems via simulation. Our aim is to combine the property of multi-armed bandits with the ability of genetic algorithms to handle high-dimensional solution spaces.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a new algorithm, referred to as GMAB, that combines concepts from the reinforcement learning domain of multi-armed bandits and random search strategies from the domain of genetic algorithms to solve discrete stochastic optimization problems via simulation. In particular, the focus is on noisy large-scale problems, which often involve a multitude of dimensions as well as multiple local optima. Our aim is to combine the property of multi-armed bandits to cope with volatile simulation observations with the ability of genetic algorithms to handle high-dimensional solution spaces accompanied by an enormous number of feasible solutions. For this purpose, a multi-armed bandit framework serves as a foundation, where each observed simulation is incorporated into the memory of GMAB. Based on this memory, genetic operators guide the search, as they provide powerful tools for exploration as well as exploitation. The empirical results demonstrate that GMAB achieves superior performance compared to benchmark algorithms from the literature in a large variety of test problems. In all experiments, GMAB required considerably fewer simulations to achieve similar or (far) better solutions than those generated by existing methods. At the same time, GMAB's overhead with regard to the required runtime is extremely small due to the suggested tree-based implementation of its memory. Furthermore, we prove its convergence to the set of global optima as the simulation effort goes to infinity.

Related papers

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
A unified consensus-based parallel ADMM algorithm for high-dimensional regression with combined regularizations [3.280169909938912]
parallel alternating multipliers (ADMM) is widely recognized for its effectiveness in handling large-scale distributed datasets. The proposed algorithms serve to demonstrate the reliability, stability, and scalability of a financial example.
arXiv Detail & Related papers (2023-11-21T03:30:38Z)
An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
Simulation-guided Beam Search for Neural Combinatorial Optimization [13.072343634530883]
We propose simulation-guided beam search (SGBS) for neural optimization problems. We hybridize SGBS with efficient active search (EAS), where SGBS enhances the quality of solutions backpropagated in EAS. We evaluate our methods on well-known CO benchmarks and show that SGBS significantly improves the quality of the solutions found under reasonable assumptions.
arXiv Detail & Related papers (2022-07-13T13:34:35Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
A bi-level encoding scheme for the clustered shortest-path tree problem in multifactorial optimization [1.471992435706872]
The Clustered Shortest-Path Tree Problem (CluSPT) plays an important role in various types of optimization problems in real-life. Recently, some Multifactorial Evolutionary Algorithm (MFEA) have been introduced to deal with the CluSPT. This paper describes a MFEA-based approach to solve the CluSPT.
arXiv Detail & Related papers (2021-02-12T13:36:07Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Fast and stable MAP-Elites in noisy domains using deep grids [1.827510863075184]
Deep-Grid MAP-Elites is a variant of the MAP-Elites algorithm that uses an archive of similar previously encountered solutions to approximate the performance of a solution. We show that this simple approach is significantly more resilient to noise on the behavioural descriptors, while achieving competitive performances in terms of fitness optimisation.
arXiv Detail & Related papers (2020-06-25T08:47:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.