Enhancing reinforcement learning by a finite reward response filter with
a case study in intelligent structural control
- URL: http://arxiv.org/abs/2010.15597v1
- Date: Sun, 25 Oct 2020 19:28:35 GMT
- Title: Enhancing reinforcement learning by a finite reward response filter with
a case study in intelligent structural control
- Authors: Hamid Radmard Rahmani, Carsten Koenke, Marco A. Wiering
- Abstract summary: In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment.
This paper introduces an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action.
We have applied the developed method to a structural control problem in which the goal of the agent is to reduce the vibrations of a building subjected to earthquake excitations with a specified delay.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many reinforcement learning (RL) problems, it takes some time until a
taken action by the agent reaches its maximum effect on the environment and
consequently the agent receives the reward corresponding to that action by a
delay called action-effect delay. Such delays reduce the performance of the
learning algorithm and increase the computational costs, as the reinforcement
learning agent values the immediate rewards more than the future reward that is
more related to the taken action. This paper addresses this issue by
introducing an applicable enhanced Q-learning method in which at the beginning
of the learning phase, the agent takes a single action and builds a function
that reflects the environments response to that action, called the reflexive
$\gamma$ - function. During the training phase, the agent utilizes the created
reflexive $\gamma$- function to update the Q-values. We have applied the
developed method to a structural control problem in which the goal of the agent
is to reduce the vibrations of a building subjected to earthquake excitations
with a specified delay. Seismic control problems are considered as a complex
task in structural engineering because of the stochastic and unpredictable
nature of earthquakes and the complex behavior of the structure. Three
scenarios are presented to study the effects of zero, medium, and long
action-effect delays and the performance of the Enhanced method is compared to
the standard Q-learning method. Both RL methods use neural network to learn to
estimate the state-action value function that is used to control the structure.
The results show that the enhanced method significantly outperforms the
performance of the original method in all cases, and also improves the
stability of the algorithm in dealing with action-effect delays.
Related papers
- TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning [27.93845816476777]
This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL)
TOP-ERL is a novel algorithm that enables off-policy updates in the ERL framework.
arXiv Detail & Related papers (2024-10-12T13:55:26Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - Structure-Enhanced DRL for Optimal Transmission Scheduling [43.801422320012286]
This paper focuses on the transmission scheduling problem of a remote estimation system.
We develop a structure-enhanced deep reinforcement learning framework for optimal scheduling of the system.
In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure.
arXiv Detail & Related papers (2022-12-24T10:18:38Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Delayed Reinforcement Learning by Imitation [31.932677462399468]
We present a novel algorithm that learns how to act in a delayed environment from undelayed demonstrations.
We show that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks.
arXiv Detail & Related papers (2022-05-11T15:27:33Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - RL-Controller: a reinforcement learning framework for active structural
control [0.0]
We present a novel RL-based approach for designing active controllers by introducing RL-Controller, a flexible and scalable simulation environment.
We show that the proposed framework is easily trainable for a five story benchmark building with 65% reductions on average in inter story drifts.
In a comparative study with LQG active control method, we demonstrate that the proposed model-free algorithm learns more optimal actuator forcing strategies.
arXiv Detail & Related papers (2021-03-13T04:42:13Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.