Online Learning for Incentive-Based Demand Response
- URL: http://arxiv.org/abs/2303.15617v1
- Date: Mon, 27 Mar 2023 22:08:05 GMT
- Title: Online Learning for Incentive-Based Demand Response
- Authors: Deepan Muthirayan, and Pramod P. Khargonekar
- Abstract summary: We consider the problem of learning online to manage Demand Response (DR) resources.
We propose an online learning scheme that employs least-squares for estimation with a perturbation to the reward price.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider the problem of learning online to manage Demand
Response (DR) resources. A typical DR mechanism requires the DR manager to
assign a baseline to the participating consumer, where the baseline is an
estimate of the counterfactual consumption of the consumer had it not been
called to provide the DR service. A challenge in estimating baseline is the
incentive the consumer has to inflate the baseline estimate. We consider the
problem of learning online to estimate the baseline and to optimize the
operating costs over a period of time under such incentives. We propose an
online learning scheme that employs least-squares for estimation with a
perturbation to the reward price (for the DR services or load curtailment) that
is designed to balance the exploration and exploitation trade-off that arises
with online learning. We show that, our proposed scheme is able to achieve a
very low regret of $\mathcal{O}\left((\log{T})^2\right)$ with respect to the
optimal operating cost over $T$ days of the DR program with full knowledge of
the baseline, and is individually rational for the consumers to participate.
Our scheme is significantly better than the averaging type approach, which only
fetches $\mathcal{O}(T^{1/3})$ regret.
Related papers
- Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - Learning to Schedule Online Tasks with Bandit Feedback [7.671139712158846]
Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.
We propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.
DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module.
arXiv Detail & Related papers (2024-02-26T10:11:28Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers [64.88759709443819]
We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search.
The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one.
The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
arXiv Detail & Related papers (2022-12-22T14:26:11Z) - Network Revenue Management with Demand Learning and Fair
Resource-Consumption Balancing [16.37657820732206]
We study the price-based network revenue management (NRM) problem with both demand learning and fair resource-consumption balancing.
We propose a primal-dual-type online policy with the Upper-Confidence-Bound (UCB) demand learning method to maximize the regularized revenue.
Our algorithm achieves a worst-case regret of $widetilde O(N5/2sqrtT)$, where $N$ denotes the number of products and $T$ denotes the number of time periods.
arXiv Detail & Related papers (2022-07-22T15:55:49Z) - Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor [68.17855675511602]
We investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO)
We show that the proposed algorithm can minimize the adaptive regret with switching cost in every interval.
arXiv Detail & Related papers (2022-05-02T08:48:22Z) - Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer
Credit [0.0]
We introduce a method for pricing consumer credit using recent advances in offline deep reinforcement learning.
This approach relies on a static dataset and requires no assumptions on the functional form of demand.
arXiv Detail & Related papers (2022-03-06T16:32:53Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Online Residential Demand Response via Contextual Multi-Armed Bandits [8.817815952311676]
One major challenge in residential demand response (DR) is to handle the unknown and uncertain customer behaviors.
Previous works use learning techniques to predict customer DR behaviors, while the influence of time-varying environmental factors is generally neglected.
In this paper, we consider the residential DR problem where the load service entity (LSE) aims to select an optimal subset of customers to maximize the expected load reduction with a financial budget.
Online learning and selection (OLS) algorithm based on Thompson sampling is proposed to solve it.
arXiv Detail & Related papers (2020-03-07T18:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.