A Comparison of Model-Free and Model Predictive Control for Price
Responsive Water Heaters
- URL: http://arxiv.org/abs/2111.04689v1
- Date: Mon, 8 Nov 2021 18:06:43 GMT
- Title: A Comparison of Model-Free and Model Predictive Control for Price
Responsive Water Heaters
- Authors: David J. Biagioni, Xiangyu Zhang, Peter Graf, Devon Sigler, Wesley
Jones
- Abstract summary: We present a comparison of two model-free control algorithms, with receding horizon model predictive control (MPC)
Four MPC variants are considered: a one-shot controller with perfect forecasting yielding optimal control; a limited-horizon controller with perfect forecasting; and a two-stage programming controller using historical scenarios.
We show that both ES and PPO learn good general purpose policies that outperform mean forecast and two-stage MPC controllers in terms of average cost and are more than two orders of magnitude faster at computing actions.
- Score: 7.579687492224987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a careful comparison of two model-free control algorithms,
Evolution Strategies (ES) and Proximal Policy Optimization (PPO), with receding
horizon model predictive control (MPC) for operating simulated, price
responsive water heaters. Four MPC variants are considered: a one-shot
controller with perfect forecasting yielding optimal control; a limited-horizon
controller with perfect forecasting; a mean forecasting-based controller; and a
two-stage stochastic programming controller using historical scenarios. In all
cases, the MPC model for water temperature and electricity price are exact;
only water demand is uncertain. For comparison, both ES and PPO learn neural
network-based policies by directly interacting with the simulated environment
under the same scenarios used by MPC. All methods are then evaluated on a
separate one-week continuation of the demand time series. We demonstrate that
optimal control for this problem is challenging, requiring more than 8-hour
lookahead for MPC with perfect forecasting to attain the minimum cost. Despite
this challenge, both ES and PPO learn good general purpose policies that
outperform mean forecast and two-stage stochastic MPC controllers in terms of
average cost and are more than two orders of magnitude faster at computing
actions. We show that ES in particular can leverage parallelism to learn a
policy in under 90 seconds using 1150 CPU cores.
Related papers
- Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior.
We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.
We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - Which price to pay? Auto-tuning building MPC controller for optimal economic cost [7.400001848945602]
Model predictive control (MPC) controller is considered for temperature management in buildings.
We propose an efficient performance-oriented building MPC controller tuning method based on a cutting-edge efficient constrained Bayesian optimization algorithm.
The results indicate that with an optimized simple MPC, the monthly electricity cost of a household can be reduced by up to 26.90% compared with the cost when controlled by a basic rule-based controller.
arXiv Detail & Related papers (2025-01-18T19:52:27Z) - Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Comparison of Model Predictive Control and Proximal Policy Optimization for a 1-DOF Helicopter System [0.7499722271664147]
This study conducts a comparative analysis of Model Predictive Control (MPC) and Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) algorithm, applied to a Quanser Aero 2 system.
PPO excels in rise-time and adaptability, making it a promising approach for applications requiring rapid response and adaptability.
arXiv Detail & Related papers (2024-08-28T08:35:34Z) - Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process.
We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z) - A Hybrid Model for Forecasting Short-Term Electricity Demand [59.372588316558826]
Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator.
We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and LSTM encoder-decoders.
arXiv Detail & Related papers (2022-05-20T22:13:25Z) - Predictive Accuracy of a Hybrid Generalized Long Memory Model for Short
Term Electricity Price Forecasting [0.0]
This study investigates the predictive performance of a new hybrid model based on the Generalized long memory autoregressive model (k-factor GARMA)
The performance of the proposed model is evaluated using data from Nord Pool Electricity markets.
arXiv Detail & Related papers (2022-04-18T12:21:25Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Model-predictive control and reinforcement learning in multi-energy
system case studies [0.2810625954925815]
We present an on-objective and off-policy multi- reinforcement learning (RL) approach against a linear model-predictive-control (LMPC)
We show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%)
While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%)
arXiv Detail & Related papers (2021-04-20T06:51:50Z) - Blending MPC & Value Function Approximation for Efficient Reinforcement
Learning [42.429730406277315]
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems.
We present a framework for improving on MPC with model-free reinforcement learning (RL)
We show that our approach can obtain performance comparable with MPC with access to true dynamics.
arXiv Detail & Related papers (2020-12-10T11:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.