Model-based adaptation for sample efficient transfer in reinforcement
learning control of parameter-varying systems
- URL: http://arxiv.org/abs/2305.12158v1
- Date: Sat, 20 May 2023 10:11:09 GMT
- Title: Model-based adaptation for sample efficient transfer in reinforcement
learning control of parameter-varying systems
- Authors: Ibrahim Ahmed and Marcos Quinones-Grueiro and Gautam Biswas
- Abstract summary: We leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning algorithms.
We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone.
- Score: 1.8799681615947088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we leverage ideas from model-based control to address the
sample efficiency problem of reinforcement learning (RL) algorithms.
Accelerating learning is an active field of RL highly relevant in the context
of time-varying systems. Traditional transfer learning methods propose to use
prior knowledge of the system behavior to devise a gradual or immediate
data-driven transformation of the control policy obtained through RL. Such
transformation is usually computed by estimating the performance of previous
control policies based on measurements recently collected from the system.
However, such retrospective measures have debatable utility with no guarantees
of positive transfer in most cases. Instead, we propose a model-based
transformation, such that when actions from a control policy are applied to the
target system, a positive transfer is achieved. The transformation can be used
as an initialization for the reinforcement learning process to converge to a
new optimum. We validate the performance of our approach through four benchmark
examples. We demonstrate that our approach is more sample-efficient than
fine-tuning with reinforcement learning alone and achieves comparable
performance to linear-quadratic-regulators and model-predictive control when an
accurate linear model is known in the three cases. If an accurate model is not
known, we empirically show that the proposed approach still guarantees positive
transfer with jump-start improvement.
Related papers
- Active Learning for Control-Oriented Identification of Nonlinear Systems [26.231260751633307]
We present the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics.
In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors.
We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems.
arXiv Detail & Related papers (2024-04-13T15:40:39Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Model Based Residual Policy Learning with Applications to Antenna
Control [5.01069065110753]
Non-differentiable controllers and rule-based policies are widely used for controlling real systems such as telecommunication networks and robots.
Motivated by the antenna tilt control problem, we introduce Model-Based Residual Policy Learning (MBRPL), a practical reinforcement learning (RL) method.
arXiv Detail & Related papers (2022-11-16T09:48:14Z) - Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control.
Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Backward Imitation and Forward Reinforcement Learning via Bi-directional
Model Rollouts [11.4219428942199]
Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model.
In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework.
BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner.
arXiv Detail & Related papers (2022-08-04T04:04:05Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Control-Aware Representations for Model-based Reinforcement Learning [36.221391601609255]
A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations.
Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the observations into a lower-dimensional latent space.
Two important questions in this area are how to learn a representation that is amenable to the control problem at hand, and how to achieve an end-to-end framework for representation learning and control.
arXiv Detail & Related papers (2020-06-24T01:00:32Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.