Memory-based Deep Reinforcement Learning for POMDP
- URL: http://arxiv.org/abs/2102.12344v2
- Date: Thu, 25 Feb 2021 03:15:10 GMT
- Title: Memory-based Deep Reinforcement Learning for POMDP
- Authors: Lingheng Meng, Rob Gorbet, Dana Kuli\'c
- Abstract summary: Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3)
Our results demonstrate the significant advantages of the memory component in addressing Partially Observable MDPs.
- Score: 7.137228786549488
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A promising characteristic of Deep Reinforcement Learning (DRL) is its
capability to learn optimal policy in an end-to-end manner without relying on
feature engineering. However, most approaches assume a fully observable state
space, i.e. fully observable Markov Decision Process (MDP). In real-world
robotics, this assumption is unpractical, because of the sensor issues such as
sensors' capacity limitation and sensor noise, and the lack of knowledge about
if the observation design is complete or not. These scenarios lead to Partially
Observable MDP (POMDP) and need special treatment. In this paper, we propose
Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient
(LSTM-TD3) by introducing a memory component to TD3, and compare its
performance with other DRL algorithms in both MDPs and POMDPs. Our results
demonstrate the significant advantages of the memory component in addressing
POMDPs, including the ability to handle missing and noisy observation data.
Related papers
- Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes.
Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z) - R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous
Robotics [9.2327813168753]
This paper presents R3, a holistic solution for managing timing, memory, and algorithm performance in on-device real-time DRL training.
R3 employs (i) a deadline-driven feedback loop with dynamic batch sizing for optimizing timing, (ii) efficient memory management to reduce memory footprint and allow larger replay buffer sizes, and (iii) a runtime coordinator guided by runtime analysis and a runtime profiler for adjusting memory resource reservations.
arXiv Detail & Related papers (2023-08-29T05:48:28Z) - Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time.
We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL.
For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Deep Deterministic Uncertainty for Semantic Segmentation [97.89295891304394]
We extend Deep Deterministic Uncertainty (DDU) to semantic segmentation.
We show that DDU improves upon MC Dropout and Deep Ensembles while being significantly faster to compute.
arXiv Detail & Related papers (2021-10-29T20:45:58Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - End-to-End Egospheric Spatial Memory [32.42361470456194]
We propose a parameter-free module, Egospheric Spatial Memory (ESM), which encodes the memory in an ego-sphere around the agent.
ESM can be trained end-to-end via either imitation or reinforcement learning.
We show applications to semantic segmentation on the ScanNet dataset, where ESM naturally combines image-level and map-level inference modalities.
arXiv Detail & Related papers (2021-02-15T18:59:07Z) - DeepAveragers: Offline Reinforcement Learning by Solving Derived
Non-Parametric MDPs [47.73837217824527]
We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience.
Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL.
arXiv Detail & Related papers (2020-10-18T00:11:45Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.