Continuous Control for Searching and Planning with a Learned Model
- URL: http://arxiv.org/abs/2006.07430v2
- Date: Mon, 22 Jun 2020 03:32:50 GMT
- Title: Continuous Control for Searching and Planning with a Learned Model
- Authors: Xuxi Yang, Werner Duvaud, Peng Wei
- Abstract summary: Decision-making agents with planning capabilities have achieved huge success in the challenging domain like Chess, Shogi, and Go.
Researchers proposed the MuZero algorithm that can learn the dynamical model through the interactions with the environment.
We show the proposed algorithm outperforms the soft actor-critic (SAC) algorithm, a state-of-the-art model-free deep reinforcement learning algorithm.
- Score: 5.196149362684628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision-making agents with planning capabilities have achieved huge success
in the challenging domain like Chess, Shogi, and Go. In an effort to generalize
the planning ability to the more general tasks where the environment dynamics
are not available to the agent, researchers proposed the MuZero algorithm that
can learn the dynamical model through the interactions with the environment. In
this paper, we provide a way and the necessary theoretical results to extend
the MuZero algorithm to more generalized environments with continuous action
space. Through numerical results on two relatively low-dimensional MuJoCo
environments, we show the proposed algorithm outperforms the soft actor-critic
(SAC) algorithm, a state-of-the-art model-free deep reinforcement learning
algorithm.
Related papers
- Discovering General Reinforcement Learning Algorithms with Adversarial
Environment Design [54.39859618450935]
We show that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a gap when these algorithms are applied to unseen environments.
In this work, we examine how characteristics of the meta-supervised-training distribution impact the performance of these algorithms.
arXiv Detail & Related papers (2023-10-04T12:52:56Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Optimistic Active Exploration of Dynamical Systems [52.91573056896633]
We develop an algorithm for active exploration called OPAX.
We show how OPAX can be reduced to an optimal control problem that can be solved at each episode.
Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.
arXiv Detail & Related papers (2023-06-21T16:26:59Z) - Online Submodular Coordination with Bounded Tracking Regret: Theory,
Algorithm, and Applications to Multi-Robot Coordination [15.588080817106563]
We are motivated by the future autonomy that involves multiple robots coordinating in dynamic, unstructured, and adversarial environments.
We introduce the first submodular coordination algorithm with bounded tracking regret, ie., with bounded suboptimality with respect to optimal time-varying actions that know the future a priori.
Our algorithm generalizes the seminal Sequential Greedy algorithm by Fisher et al. to unpredictable environments, leveraging submodularity and algorithms for the problem of tracking the best expert.
arXiv Detail & Related papers (2022-09-26T05:31:34Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Learning Cooperation and Online Planning Through Simulation and Graph
Convolutional Network [5.505634045241288]
We introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments.
Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation.
It also improves scalability through an effective pruning of action space.
arXiv Detail & Related papers (2021-10-16T05:54:32Z) - Planning for Novelty: Width-Based Algorithms for Common Problems in
Control, Planning and Reinforcement Learning [6.053629733936546]
Width-based algorithms search for solutions through a general definition of state novelty.
These algorithms have been shown to result in state-of-the-art performance in classical planning.
arXiv Detail & Related papers (2021-06-09T07:46:19Z) - Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward
Model [24.030426634281643]
In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments.
We propose a density-free off-policy algorithm, Generative Actor-Critic, using the push-forward model to increase the expressiveness of policies.
We show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and performance of algorithms obviously.
arXiv Detail & Related papers (2021-05-08T16:29:20Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.