CT-DQN: Control-Tutored Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2212.01343v1
- Date: Fri, 2 Dec 2022 17:59:43 GMT
- Title: CT-DQN: Control-Tutored Deep Reinforcement Learning
- Authors: Francesco De Lellis, Marco Coraggio, Giovanni Russo, Mirco Musolesi,
Mario di Bernardo
- Abstract summary: Control-Tutored Deep Q-Networks (CT-DQN) is a Deep Reinforcement Learning algorithm that leverages a control tutor to reduce learning time.
We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing.
- Score: 4.395396671038298
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One of the major challenges in Deep Reinforcement Learning for control is the
need for extensive training to learn the policy. Motivated by this, we present
the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep
Reinforcement Learning algorithm that leverages a control tutor, i.e., an
exogenous control law, to reduce learning time. The tutor can be designed using
an approximate model of the system, without any assumption about the knowledge
of the system's dynamics. There is no expectation that it will be able to
achieve the control objective if used stand-alone. During learning, the tutor
occasionally suggests an action, thus partially guiding exploration. We
validate our approach on three scenarios from OpenAI Gym: the inverted
pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to
achieve better or equivalent data efficiency with respect to the classic
function approximation solutions.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management [8.08366903467967]
We adapt the reinforcement learning framework to intensity control using choice-based network revenue management.
We show that by utilizing the inherent discretization of the sample paths created by the jump points, one does not need to discretize the time horizon in advance.
arXiv Detail & Related papers (2024-06-08T05:27:01Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Tracking Control for a Spherical Pendulum via Curriculum Reinforcement
Learning [27.73555826776087]
Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data.
In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations.
We demonstrate the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.
arXiv Detail & Related papers (2023-09-25T12:48:47Z) - Reinforcement Learning-Based Control of CrazyFlie 2.X Quadrotor [0.0]
The objective of the project is to explore synergies between classical control algorithms such as PID and contemporary reinforcement learning algorithms.
The primary objective would be performing PID tuning using reinforcement learning strategies.
The secondary objective is to leverage the learnings to implement control for navigation by integrating with the lighthouse positioning system.
arXiv Detail & Related papers (2023-06-06T18:29:10Z) - Deep Q-learning: a robust control approach [4.125187280299247]
We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning.
We show the instability of learning and analyze the agent's behavior in frequency-domain.
Numerical simulations in different OpenAI Gym environments suggest that the $mathcalH_infty$ controlled learning performs slightly better than Double deep Q-learning.
arXiv Detail & Related papers (2022-01-21T09:47:34Z) - Reinforcement Learning for Control of Valves [0.0]
This paper is a study of reinforcement learning (RL) as an optimal-control strategy for control of nonlinear valves.
It is evaluated against the PID (proportional-integral-derivative) strategy, using a unified framework.
arXiv Detail & Related papers (2020-12-29T09:01:47Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning.
We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z) - Online Reinforcement Learning Control by Direct Heuristic Dynamic
Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives.
It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise.
We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.