Related papers: Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

URL: http://arxiv.org/abs/2006.08938v1
Date: Tue, 16 Jun 2020 05:51:25 GMT
Title: Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven
Authors: Qingtao Zhao, Jennie Si, Jian Sun
Abstract summary: Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
Score: 80.94390916562179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. Among existing approximate dynamic programming (ADP) and reinforcement learning (RL) algorithms, the direct heuristic dynamic programming (dHDP) has been shown an effective tool as demonstrated in solving several complex learning control problems. It continuously updates the control policy and the critic as system states continuously evolve. It is therefore desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. Toward this goal, we propose a new event-driven dHDP. By constructing a Lyapunov function candidate, we prove the uniformly ultimately boundedness (UUB) of the system states and the weights in the critic and the control policy networks. Consequently we show the approximate control and cost-to-go function approaching Bellman optimality within a finite bound. We also illustrate how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.

Related papers

Neural Predictive Control to Coordinate Discrete- and Continuous-Time Models for Time-Series Analysis with Control-Theoretical Improvements [46.19047880604178]
We recast time-series problems as the continuous ODE-based optimal control problem.<n>Rather than learning dynamics solely from data, we optimize control actions that steer ODE trajectories toward task objectives.<n>We show that, under mild assumptions, this multi-horizon optimization leads to exponential convergence to infinite-horizon solutions.
arXiv Detail & Related papers (2025-08-03T16:41:00Z)
Learning from Demonstration with Implicit Nonlinear Dynamics Models [16.26835655544884]
We develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting dataset.
arXiv Detail & Related papers (2024-09-27T14:12:49Z)
Learning Noise-Robust Stable Koopman Operator for Control with Hankel DMD [1.0742675209112622]
We propose a noise-robust learning framework for the Koopman operator of nonlinear dynamical systems. We leverage observables generated by the system dynamics, when the system dynamics is known, through a Hankel matrix. We approximate them with a neural network while maintaining structural similarities to discrete Polyflow.
arXiv Detail & Related papers (2024-08-13T03:39:34Z)
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP) We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge. We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z)
PID Control-Based Self-Healing to Improve the Robustness of Large Language Models [23.418411870842178]
Minor perturbations can significantly reduce the performance of well-trained language models. We construct a computationally efficient self-healing process to correct undesired model behavior. The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models.
arXiv Detail & Related papers (2024-03-31T23:46:51Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs) We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
DDPNOpt: Differential Dynamic Programming Neural Optimizer [29.82841891919951]
We show that most widely-used algorithms for trainings can be linked to the Differential Dynamic Programming (DDP) In this vein, we propose a new class of DDPOpt, for training feedforward and convolution networks.
arXiv Detail & Related papers (2020-02-20T15:42:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.