Related papers: CT-DQN: Control-Tutored Deep Reinforcement Learning

CT-DQN: Control-Tutored Deep Reinforcement Learning

URL: http://arxiv.org/abs/2212.01343v1
Date: Fri, 2 Dec 2022 17:59:43 GMT
Title: CT-DQN: Control-Tutored Deep Reinforcement Learning
Authors: Francesco De Lellis, Marco Coraggio, Giovanni Russo, Mirco Musolesi, Mario di Bernardo
Abstract summary: Control-Tutored Deep Q-Networks (CT-DQN) is a Deep Reinforcement Learning algorithm that leverages a control tutor to reduce learning time. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing.
Score: 4.395396671038298
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.

Related papers

A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time" It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z)
Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management [8.08366903467967]
We adapt the reinforcement learning framework to intensity control using choice-based network revenue management. We show that by utilizing the inherent discretization of the sample paths created by the jump points, one does not need to discretize the time horizon in advance.
arXiv Detail & Related papers (2024-06-08T05:27:01Z)
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z)
Tracking Control for a Spherical Pendulum via Curriculum Reinforcement Learning [27.73555826776087]
Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data. In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations. We demonstrate the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.
arXiv Detail & Related papers (2023-09-25T12:48:47Z)
Reinforcement Learning-Based Control of CrazyFlie 2.X Quadrotor [0.0]
The objective of the project is to explore synergies between classical control algorithms such as PID and contemporary reinforcement learning algorithms. The primary objective would be performing PID tuning using reinforcement learning strategies. The secondary objective is to leverage the learnings to implement control for navigation by integrating with the lighthouse positioning system.
arXiv Detail & Related papers (2023-06-06T18:29:10Z)
Deep Q-learning: a robust control approach [4.125187280299247]
We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Numerical simulations in different OpenAI Gym environments suggest that the $mathcalH_infty$ controlled learning performs slightly better than Double deep Q-learning.
arXiv Detail & Related papers (2022-01-21T09:47:34Z)
Reinforcement Learning for Control of Valves [0.0]
This paper is a study of reinforcement learning (RL) as an optimal-control strategy for control of nonlinear valves. It is evaluated against the PID (proportional-integral-derivative) strategy, using a unified framework.
arXiv Detail & Related papers (2020-12-29T09:01:47Z)
Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms. We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.