Related papers: Neural Episodic Control with State Abstraction

Neural Episodic Control with State Abstraction

URL: http://arxiv.org/abs/2301.11490v1
Date: Fri, 27 Jan 2023 01:55:05 GMT
Title: Neural Episodic Control with State Abstraction
Authors: Zhuo Li, Derui Zhu, Yujing Hu, Xiaofei Xie, Lei Ma, Yan Zheng, Yan Song, Yingfeng Chen, Jianjun Zhao
Abstract summary: Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. This work introduces Neural Episodic Control with State Abstraction (NECSA) We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains.
Score: 38.95199070504417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. Generally, episodic control-based approaches are solutions that leverage highly-rewarded past experiences to improve sample efficiency of DRL algorithms. However, previous episodic control-based approaches fail to utilize the latent information from the historical behaviors (e.g., state transitions, topological similarities, etc.) and lack scalability during DRL training. This work introduces Neural Episodic Control with State Abstraction (NECSA), a simple but effective state abstraction-based episodic control containing a more comprehensive episodic memory, a novel state evaluation, and a multi-step state analysis. We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains. The experimental results indicate that NECSA achieves higher sample efficiency than the state-of-the-art episodic control-based approaches. Our data and code are available at the project website\footnote{\url{https://sites.google.com/view/drl-necsa}}.

Related papers

Episodic Reinforcement Learning with Expanded State-reward Space [1.479675621064679]
We introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information. Our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss.
arXiv Detail & Related papers (2024-01-19T06:14:36Z)
Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection [18.328245109223964]
Deep learning (DL) has spurred interest in enhancing log-based anomaly detection. Traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. We optimize the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation.
arXiv Detail & Related papers (2023-08-24T07:22:29Z)
Efficient Reinforcement Learning with Impaired Observability: Learning to Act with Delayed and Missing State Observations [92.25604137490168]
This paper introduces a theoretical investigation into efficient reinforcement learning in control systems. We present algorithms and establish near-optimal regret upper and lower bounds, of the form $tildemathcalO(sqrtrm poly(H) SAK)$, for RL in the delayed and missing observation settings.
arXiv Detail & Related papers (2023-06-02T02:46:39Z)
Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z)
Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z)
Minimizing Control for Credit Assignment with Strong Feedback [65.59995261310529]
Current methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals. We combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. We show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time.
arXiv Detail & Related papers (2022-04-14T22:06:21Z)
Improved Exploring Starts by Kernel Density Estimation-Based State-Space Coverage Acceleration in Reinforcement Learning [0.0]
Reinforcement learning (RL) is a popular research topic in control engineering. RL controllers are trained in direct interaction with the controlled system, rendering them data-driven and performance-oriented solutions. DESSCA is a kernel density estimation-based state-space coverage acceleration.
arXiv Detail & Related papers (2021-05-19T08:36:26Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.