Diverse Policies Converge in Reward-free Markov Decision Processe
- URL: http://arxiv.org/abs/2308.11924v1
- Date: Wed, 23 Aug 2023 05:17:51 GMT
- Title: Diverse Policies Converge in Reward-free Markov Decision Processe
- Authors: Fanqi Lin, Shiyu Huang, Weiwei Tu
- Abstract summary: We provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies.
Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm.
- Score: 19.42193141047252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning has achieved great success in many decision-making
tasks, and traditional reinforcement learning algorithms are mainly designed
for obtaining a single optimal solution. However, recent works show the
importance of developing diverse policies, which makes it an emerging research
topic. Despite the variety of diversity reinforcement learning algorithms that
have emerged, none of them theoretically answer the question of how the
algorithm converges and how efficient the algorithm is. In this paper, we
provide a unified diversity reinforcement learning framework and investigate
the convergence of training diverse policies. Under such a framework, we also
propose a provably efficient diversity reinforcement learning algorithm.
Finally, we verify the effectiveness of our method through numerical
experiments.
Related papers
- Evaluating Ensemble Methods for News Recommender Systems [50.90330146667386]
This paper demonstrates how ensemble methods can be used to combine many diverse state-of-the-art algorithms to achieve superior results on the Microsoft News dataset (MIND)
Our findings demonstrate that a combination of NRS algorithms can outperform individual algorithms, provided that the base learners are sufficiently diverse.
arXiv Detail & Related papers (2024-06-23T13:40:50Z) - Deep Reinforcement Learning for Dynamic Algorithm Selection: A
Proof-of-Principle Study on Differential Evolution [27.607740475924448]
We propose a deep reinforcement learning-based dynamic algorithm selection framework to accomplish this task.
We employ a sophisticated deep neural network model to infer the optimal action, ensuring informed algorithm selections.
As a proof-of-principle study, we apply this framework to a group of Differential Evolution algorithms.
arXiv Detail & Related papers (2024-03-04T15:40:28Z) - Exploring Novel Quality Diversity Methods For Generalization in
Reinforcement Learning [0.0]
The Reinforcement Learning field is strong on achievements and weak on reapplication.
This paper asks whether the method of training networks improves their generalization.
arXiv Detail & Related papers (2023-03-26T00:23:29Z) - Lexicographic Multi-Objective Reinforcement Learning [65.90380946224869]
We present a family of both action-value and policy gradient algorithms that can be used to solve such problems.
We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
arXiv Detail & Related papers (2022-12-28T10:22:36Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - DGPO: Discovering Multiple Strategies with Diversity-Guided Policy
Optimization [34.40615558867965]
We propose an on-policy algorithm that discovers multiple strategies for solving a given task.
Unlike prior work, it achieves this with a shared policy network trained over a single run.
Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks.
arXiv Detail & Related papers (2022-07-12T15:57:55Z) - Meta Navigator: Search for a Good Adaptation Policy for Few-shot
Learning [113.05118113697111]
Few-shot learning aims to adapt knowledge learned from previous tasks to novel tasks with only a limited amount of labeled data.
Research literature on few-shot learning exhibits great diversity, while different algorithms often excel at different few-shot learning scenarios.
We present Meta Navigator, a framework that attempts to solve the limitation in few-shot learning by seeking a higher-level strategy.
arXiv Detail & Related papers (2021-09-13T07:20:01Z) - Algorithm Selection on a Meta Level [58.720142291102135]
We introduce the problem of meta algorithm selection, which essentially asks for the best way to combine a given set of algorithm selectors.
We present a general methodological framework for meta algorithm selection as well as several concrete learning methods as instantiations of this framework.
arXiv Detail & Related papers (2021-07-20T11:23:21Z) - Discovering Diverse Solutions in Deep Reinforcement Learning [84.45686627019408]
Reinforcement learning algorithms are typically limited to learning a single solution of a specified task.
We propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable.
arXiv Detail & Related papers (2021-03-12T04:54:31Z) - Safe Learning and Optimization Techniques: Towards a Survey of the State
of the Art [3.6954802719347413]
Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points.
A comprehensive survey of safe reinforcement learning algorithms was published in 2015, but related works in active learning and in optimization were not considered.
This paper reviews those algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary algorithms, and active learning.
arXiv Detail & Related papers (2021-01-23T13:58:09Z) - SEERL: Sample Efficient Ensemble Reinforcement Learning [20.983016439055188]
We present a novel training and model selection framework for model-free reinforcement algorithms.
We show that learning and selecting an adequately diverse set of policies is required for a good ensemble.
Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state-of-the-art (SOTA) scores in Atari 2600 and Mujoco.
arXiv Detail & Related papers (2020-01-15T10:12:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.