Related papers: Deep Q-learning: a robust control approach

Deep Q-learning: a robust control approach

URL: http://arxiv.org/abs/2201.08610v1
Date: Fri, 21 Jan 2022 09:47:34 GMT
Title: Deep Q-learning: a robust control approach
Authors: Bal\'azs Varga, Bal\'azs Kulcs\'ar, Morteza Haghir Chehreghani
Abstract summary: We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Numerical simulations in different OpenAI Gym environments suggest that the $mathcalH_infty$ controlled learning performs slightly better than Double deep Q-learning.
Score: 4.125187280299247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we place deep Q-learning into a control-oriented perspective and study its learning dynamics with well-established techniques from robust control. We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Then, we ensure convergence via robust controllers acting as dynamical rewards in the loss function. We synthesize three controllers: state-feedback gain scheduling $\mathcal{H}_2$, dynamic $\mathcal{H}_\infty$, and constant gain $\mathcal{H}_\infty$ controllers. Setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature compared to the heuristics in reinforcement learning. In addition, our approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the $\mathcal{H}_\infty$ controlled learning performs slightly better than Double deep Q-learning.

Related papers

Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion [17.905596843865705]
We propose a novel two-stage training framework to jointly synthesize the controller and Lyapunov function for continuous-time systems.<n>Unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $alpha,!beta$-CROWN.
arXiv Detail & Related papers (2025-06-02T06:20:09Z)
Learning to Reason without External Rewards [100.27210579418562]
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision.<n>We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data.<n>We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal.
arXiv Detail & Related papers (2025-05-26T07:01:06Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics. We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z)
Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF) It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z)
CT-DQN: Control-Tutored Deep Reinforcement Learning [4.395396671038298]
Control-Tutored Deep Q-Networks (CT-DQN) is a Deep Reinforcement Learning algorithm that leverages a control tutor to reduce learning time. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing.
arXiv Detail & Related papers (2022-12-02T17:59:43Z)
Improving the Performance of Robust Control through Event-Triggered Learning [74.57758188038375]
We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem. We demonstrate improved performance over a robust controller baseline in a numerical example.
arXiv Detail & Related papers (2022-07-28T17:36:37Z)
Offline Reinforcement Learning at Multiple Frequencies [62.08749079914275]
We study how well offline reinforcement learning algorithms can accommodate data with a mixture of frequencies during training. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning.
arXiv Detail & Related papers (2022-07-26T17:54:49Z)
Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems. We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z)
Online-Learning Deep Neuro-Adaptive Dynamic Inversion Controller for Model Free Control [1.3764085113103217]
A neuro-adaptive controller is implemented featuring a deep neural network trained on a new weight update law. The controller is able to learn the nonlinear plant quickly and displays good performance in the tracking control problem.
arXiv Detail & Related papers (2021-07-21T22:46:03Z)
Reinforcement Learning for Control of Valves [0.0]
This paper is a study of reinforcement learning (RL) as an optimal-control strategy for control of nonlinear valves. It is evaluated against the PID (proportional-integral-derivative) strategy, using a unified framework.
arXiv Detail & Related papers (2020-12-29T09:01:47Z)
Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems. We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.