Towards Optimal Pricing of Demand Response -- A Nonparametric
Constrained Policy Optimization Approach
- URL: http://arxiv.org/abs/2306.14047v1
- Date: Sat, 24 Jun 2023 20:07:51 GMT
- Title: Towards Optimal Pricing of Demand Response -- A Nonparametric
Constrained Policy Optimization Approach
- Authors: Jun Song and Chaoyue Zhao
- Abstract summary: Demand response (DR) has been demonstrated to be an effective method for reducing peak load and mitigating uncertainties on the supply and demand sides of the electricity market.
One critical question for DR research is how to appropriately adjust electricity prices in order to shift electrical load from peak to off-peak hours.
We propose an innovative nonparametric constrained policy optimization approach that improves optimality while ensuring stability of the policy update.
- Score: 2.345728642535161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Demand response (DR) has been demonstrated to be an effective method for
reducing peak load and mitigating uncertainties on both the supply and demand
sides of the electricity market. One critical question for DR research is how
to appropriately adjust electricity prices in order to shift electrical load
from peak to off-peak hours. In recent years, reinforcement learning (RL) has
been used to address the price-based DR problem because it is a model-free
technique that does not necessitate the identification of models for end-use
customers. However, the majority of RL methods cannot guarantee the stability
and optimality of the learned pricing policy, which is undesirable in
safety-critical power systems and may result in high customer bills. In this
paper, we propose an innovative nonparametric constrained policy optimization
approach that improves optimality while ensuring stability of the policy
update, by removing the restrictive assumption on policy representation that
the majority of the RL literature adopts: the policy must be parameterized or
fall into a certain distribution class. We derive a closed-form expression of
optimal policy update for each iteration and develop an efficient on-policy
actor-critic algorithm to address the proposed constrained policy optimization
problem. The experiments on two DR cases show the superior performance of our
proposed nonparametric constrained policy optimization method compared with
state-of-the-art RL algorithms.
Related papers
- Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning [19.533619091287676]
We propose a novel preferred-action-optimized diffusion policy for offline reinforcement learning.
In particular, an expressive conditional diffusion model is utilized to represent the diverse distribution of a behavior policy.
Experiments demonstrate that the proposed method provides competitive or superior performance compared to previous state-of-the-art offline RL methods.
arXiv Detail & Related papers (2024-05-29T03:19:59Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z) - Optimistic Distributionally Robust Policy Optimization [2.345728642535161]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are prone to converge to a sub-optimal solution as they limit policy representation to a particular parametric distribution class.
We develop an innovative Optimistic Distributionally Robust Policy Optimization (ODRO) algorithm to solve the trust region constrained optimization problem without parameterizing the policies.
Our algorithm improves TRPO and PPO with a higher sample efficiency and a better performance of the final policy while attaining the learning stability.
arXiv Detail & Related papers (2020-06-14T06:36:18Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.