Related papers: Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control

Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control

URL: http://arxiv.org/abs/2010.15597v1
Date: Sun, 25 Oct 2020 19:28:35 GMT
Title: Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control
Authors: Hamid Radmard Rahmani, Carsten Koenke, Marco A. Wiering
Abstract summary: In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment. This paper introduces an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action. We have applied the developed method to a structural control problem in which the goal of the agent is to reduce the vibrations of a building subjected to earthquake excitations with a specified delay.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment and consequently the agent receives the reward corresponding to that action by a delay called action-effect delay. Such delays reduce the performance of the learning algorithm and increase the computational costs, as the reinforcement learning agent values the immediate rewards more than the future reward that is more related to the taken action. This paper addresses this issue by introducing an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action and builds a function that reflects the environments response to that action, called the reflexive $\gamma$ - function. During the training phase, the agent utilizes the created reflexive $\gamma$- function to update the Q-values. We have applied the developed method to a structural control problem in which the goal of the agent is to reduce the vibrations of a building subjected to earthquake excitations with a specified delay. Seismic control problems are considered as a complex task in structural engineering because of the stochastic and unpredictable nature of earthquakes and the complex behavior of the structure. Three scenarios are presented to study the effects of zero, medium, and long action-effect delays and the performance of the Enhanced method is compared to the standard Q-learning method. Both RL methods use neural network to learn to estimate the state-action value function that is used to control the structure. The results show that the enhanced method significantly outperforms the performance of the original method in all cases, and also improves the stability of the algorithm in dealing with action-effect delays.

Related papers

Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions. This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z)
Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks [5.115170525117103]
Effective evaluation of real-time strategy tasks requires adaptive mechanisms to cope with dynamic and unpredictable environments. This study proposes a method to improve evaluation functions for real-time responsiveness to battle-field situation changes.
arXiv Detail & Related papers (2025-01-07T14:36:33Z)
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning [27.93845816476777]
This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL) TOP-ERL is a novel algorithm that enables off-policy updates in the ERL framework.
arXiv Detail & Related papers (2024-10-12T13:55:26Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy [38.86867078596718]
We consider explicitly modeling the generation process of states with the graphical causal model. We formulate the causal structure updating into the RL interaction process with active intervention learning of the environment.
arXiv Detail & Related papers (2024-02-07T14:09:34Z)
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z)
Structure-Enhanced DRL for Optimal Transmission Scheduling [43.801422320012286]
This paper focuses on the transmission scheduling problem of a remote estimation system. We develop a structure-enhanced deep reinforcement learning framework for optimal scheduling of the system. In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure.
arXiv Detail & Related papers (2022-12-24T10:18:38Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Delayed Reinforcement Learning by Imitation [31.932677462399468]
We present a novel algorithm that learns how to act in a delayed environment from undelayed demonstrations. We show that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks.
arXiv Detail & Related papers (2022-05-11T15:27:33Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
RL-Controller: a reinforcement learning framework for active structural control [0.0]
We present a novel RL-based approach for designing active controllers by introducing RL-Controller, a flexible and scalable simulation environment. We show that the proposed framework is easily trainable for a five story benchmark building with 65% reductions on average in inter story drifts. In a comparative study with LQG active control method, we demonstrate that the proposed model-free algorithm learns more optimal actuator forcing strategies.
arXiv Detail & Related papers (2021-03-13T04:42:13Z)
Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions. influence estimates are fairly accurate for shallow networks. Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.