Autonomous sPOMDP Environment Modeling With Partial Model Exploitation
- URL: http://arxiv.org/abs/2012.12203v1
- Date: Tue, 22 Dec 2020 17:48:32 GMT
- Title: Autonomous sPOMDP Environment Modeling With Partial Model Exploitation
- Authors: Andrew Wilhelm, Aaron Wilhelm, Garrett Fosdick
- Abstract summary: We present a novel state space exploration algorithm by extending the original surprise-based partially-observable Markov Decision Processes (sPOMDP)
We show the proposed model significantly increases efficiency and scalability of the original sPOMDP learning techniques with a range of 31-63% gain in training speed.
Our results pave the way for extending sPOMDP solutions to a broader set of environments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A state space representation of an environment is a classic and yet powerful
tool used by many autonomous robotic systems for efficient and often optimal
solution planning. However, designing these representations with high
performance is laborious and costly, necessitating an effective and versatile
tool for autonomous generation of state spaces for autonomous robots. We
present a novel state space exploration algorithm by extending the original
surprise-based partially-observable Markov Decision Processes (sPOMDP), and
demonstrate its effective long-term exploration planning performance in various
environments. Through extensive simulation experiments, we show the proposed
model significantly increases efficiency and scalability of the original sPOMDP
learning techniques with a range of 31-63% gain in training speed while
improving robustness in environments with less deterministic transitions. Our
results pave the way for extending sPOMDP solutions to a broader set of
environments.
Related papers
- Research on Autonomous Robots Navigation based on Reinforcement Learning [13.559881645869632]
We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process.
We have verified the effectiveness and robustness of these models in various complex scenarios.
arXiv Detail & Related papers (2024-07-02T00:44:06Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - DREAM: Decentralized Reinforcement Learning for Exploration and
Efficient Energy Management in Multi-Robot Systems [14.266876062352424]
Resource-constrained robots often suffer from energy inefficiencies, underutilized computational abilities due to inadequate task allocation, and a lack of robustness in dynamic environments.
This paper introduces DREAM - Decentralized Reinforcement Learning for Exploration and Efficient Energy Management in Multi-Robot Systems.
arXiv Detail & Related papers (2023-09-29T17:43:41Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Obstacle Avoidance for Robotic Manipulator in Joint Space via Improved
Proximal Policy Optimization [6.067589886362815]
In this paper, we train a deep neural network via an improved Proximal Policy Optimization (PPO) algorithm to map from task space to joint space for a 6-DoF manipulator.
Since training such a task in real-robot is time-consuming and strenuous, we develop a simulation environment to train the model.
Experimental results showed that using our method, the robot was capable of tracking a single target or reaching multiple targets in unstructured environments.
arXiv Detail & Related papers (2022-10-03T10:21:57Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Learning Space Partitions for Path Planning [54.475949279050596]
PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima.
These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
arXiv Detail & Related papers (2021-06-19T18:06:11Z) - Scalable Multi-Robot System for Non-myopic Spatial Sampling [9.37678298330157]
This paper presents a scalable distributed multi-robot planning algorithm for non-uniform sampling of spatial fields.
We analyze the effect of communication between multiple robots, acting independently, on the overall sampling performance of the team.
arXiv Detail & Related papers (2021-05-20T20:30:10Z) - Contextual Latent-Movements Off-Policy Optimization for Robotic
Manipulation Skills [41.140532647789456]
We propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics.
We introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO)
LAMPO provides sample-efficient policies against common approaches in literature.
arXiv Detail & Related papers (2020-10-26T17:53:30Z) - Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications.
We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space.
By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.