Deep Q-learning: a robust control approach
- URL: http://arxiv.org/abs/2201.08610v1
- Date: Fri, 21 Jan 2022 09:47:34 GMT
- Title: Deep Q-learning: a robust control approach
- Authors: Bal\'azs Varga, Bal\'azs Kulcs\'ar, Morteza Haghir Chehreghani
- Abstract summary: We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning.
We show the instability of learning and analyze the agent's behavior in frequency-domain.
Numerical simulations in different OpenAI Gym environments suggest that the $mathcalH_infty$ controlled learning performs slightly better than Double deep Q-learning.
- Score: 4.125187280299247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we place deep Q-learning into a control-oriented perspective
and study its learning dynamics with well-established techniques from robust
control. We formulate an uncertain linear time-invariant model by means of the
neural tangent kernel to describe learning. We show the instability of learning
and analyze the agent's behavior in frequency-domain. Then, we ensure
convergence via robust controllers acting as dynamical rewards in the loss
function. We synthesize three controllers: state-feedback gain scheduling
$\mathcal{H}_2$, dynamic $\mathcal{H}_\infty$, and constant gain
$\mathcal{H}_\infty$ controllers. Setting up the learning agent with a
control-oriented tuning methodology is more transparent and has
well-established literature compared to the heuristics in reinforcement
learning. In addition, our approach does not use a target network and
randomized replay memory. The role of the target network is overtaken by the
control input, which also exploits the temporal dependency of samples (opposed
to a randomized memory buffer). Numerical simulations in different OpenAI Gym
environments suggest that the $\mathcal{H}_\infty$ controlled learning performs
slightly better than Double deep Q-learning.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - CT-DQN: Control-Tutored Deep Reinforcement Learning [4.395396671038298]
Control-Tutored Deep Q-Networks (CT-DQN) is a Deep Reinforcement Learning algorithm that leverages a control tutor to reduce learning time.
We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing.
arXiv Detail & Related papers (2022-12-02T17:59:43Z) - Improving the Performance of Robust Control through Event-Triggered
Learning [74.57758188038375]
We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem.
We demonstrate improved performance over a robust controller baseline in a numerical example.
arXiv Detail & Related papers (2022-07-28T17:36:37Z) - Offline Reinforcement Learning at Multiple Frequencies [62.08749079914275]
We study how well offline reinforcement learning algorithms can accommodate data with a mixture of frequencies during training.
We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning.
arXiv Detail & Related papers (2022-07-26T17:54:49Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Online-Learning Deep Neuro-Adaptive Dynamic Inversion Controller for
Model Free Control [1.3764085113103217]
A neuro-adaptive controller is implemented featuring a deep neural network trained on a new weight update law.
The controller is able to learn the nonlinear plant quickly and displays good performance in the tracking control problem.
arXiv Detail & Related papers (2021-07-21T22:46:03Z) - Reinforcement Learning for Control of Valves [0.0]
This paper is a study of reinforcement learning (RL) as an optimal-control strategy for control of nonlinear valves.
It is evaluated against the PID (proportional-integral-derivative) strategy, using a unified framework.
arXiv Detail & Related papers (2020-12-29T09:01:47Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.