Control-Tutored Reinforcement Learning: Towards the Integration of
Data-Driven and Model-Based Control
- URL: http://arxiv.org/abs/2112.06018v1
- Date: Sat, 11 Dec 2021 16:34:36 GMT
- Title: Control-Tutored Reinforcement Learning: Towards the Integration of
Data-Driven and Model-Based Control
- Authors: F. De Lellis, M. Coraggio, G. Russo, M. Musolesi, M. di Bernardo
- Abstract summary: We present an architecture where a feedback controller derived on an approximate model of the environment assists the learning process to enhance its data efficiency.
This architecture, which we term as Control-Tutored Q-learning (CTQL), is presented in two alternative flavours.
The former is based on defining the reward function so that a Boolean condition can be used to determine when the control tutor policy is adopted.
The latter, termed as probabilistic CTQL (pCTQL), is instead based on executing calls to the tutor with a certain probability during learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present an architecture where a feedback controller derived on an
approximate model of the environment assists the learning process to enhance
its data efficiency. This architecture, which we term as Control-Tutored
Q-learning (CTQL), is presented in two alternative flavours. The former is
based on defining the reward function so that a Boolean condition can be used
to determine when the control tutor policy is adopted, while the latter, termed
as probabilistic CTQL (pCTQL), is instead based on executing calls to the tutor
with a certain probability during learning. Both approaches are validated, and
thoroughly benchmarked against Q-Learning, by considering the stabilization of
an inverted pendulum as defined in OpenAI Gym as a representative problem.
Related papers
- How to discretize continuous state-action spaces in Q-learning: A symbolic control approach [0.0]
The paper presents a systematic analysis that highlights a major drawback in space discretization methods.
To address this challenge, the paper proposes a symbolic model that represents behavioral relations.
This relation allows for seamless application of the synthesized controller based on abstraction to the original system.
arXiv Detail & Related papers (2024-06-03T17:30:42Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences [16.828732283348817]
We propose an unsupervised method named Offline Learning based Anomaly Detection (OIL-AD)
OIL-AD detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association.
Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
arXiv Detail & Related papers (2024-02-07T04:06:53Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - CT-DQN: Control-Tutored Deep Reinforcement Learning [4.395396671038298]
Control-Tutored Deep Q-Networks (CT-DQN) is a Deep Reinforcement Learning algorithm that leverages a control tutor to reduce learning time.
We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing.
arXiv Detail & Related papers (2022-12-02T17:59:43Z) - Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process.
We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates [110.92598350897192]
Q-Learning has proven effective at learning a policy to perform control tasks.
estimation noise becomes a bias after the max operator in the policy improvement step.
We present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state Markov Decision Processes.
arXiv Detail & Related papers (2021-10-28T00:07:19Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.