A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations
- URL: http://arxiv.org/abs/2307.02947v2
- Date: Tue, 8 Aug 2023 10:59:45 GMT
- Title: A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations
- Authors: Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed
Afshar
- Abstract summary: Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
- Score: 0.34410212782758043
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement Learning (RL) provides a powerful framework for decision-making
in complex environments. However, implementing RL in hardware-efficient and
bio-inspired ways remains a challenge. This paper presents a novel Spiking
Neural Network (SNN) architecture for solving RL problems with real-valued
observations. The proposed model incorporates multi-layered event-based
clustering, with the addition of Temporal Difference (TD)-error modulation and
eligibility traces, building upon prior work. An ablation study confirms the
significant impact of these components on the proposed model's performance. A
tabular actor-critic algorithm with eligibility traces and a state-of-the-art
Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our
network consistently outperforms the tabular approach and successfully
discovers stable control policies on classic RL environments: mountain car,
cart-pole, and acrobot. The proposed model offers an appealing trade-off in
terms of computational and hardware implementation requirements. The model does
not require an external memory buffer nor a global error gradient computation,
and synaptic updates occur online, driven by local learning rules and a
broadcasted TD-error signal. Thus, this work contributes to the development of
more hardware-efficient RL solutions.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - Offline Reinforcement Learning with Causal Structured World Models [9.376353239574243]
We show that causal world-models can outperform plain world-models for offline RL.
We propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS)
arXiv Detail & Related papers (2022-06-03T09:53:57Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.