Online Residential Demand Response via Contextual Multi-Armed Bandits
- URL: http://arxiv.org/abs/2003.03627v2
- Date: Sun, 17 May 2020 21:55:46 GMT
- Title: Online Residential Demand Response via Contextual Multi-Armed Bandits
- Authors: Xin Chen, Yutong Nie, Na Li
- Abstract summary: One major challenge in residential demand response (DR) is to handle the unknown and uncertain customer behaviors.
Previous works use learning techniques to predict customer DR behaviors, while the influence of time-varying environmental factors is generally neglected.
In this paper, we consider the residential DR problem where the load service entity (LSE) aims to select an optimal subset of customers to maximize the expected load reduction with a financial budget.
Online learning and selection (OLS) algorithm based on Thompson sampling is proposed to solve it.
- Score: 8.817815952311676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Residential loads have great potential to enhance the efficiency and
reliability of electricity systems via demand response (DR) programs. One major
challenge in residential DR is to handle the unknown and uncertain customer
behaviors. Previous works use learning techniques to predict customer DR
behaviors, while the influence of time-varying environmental factors is
generally neglected, which may lead to inaccurate prediction and inefficient
load adjustment. In this paper, we consider the residential DR problem where
the load service entity (LSE) aims to select an optimal subset of customers to
maximize the expected load reduction with a financial budget. To learn the
uncertain customer behaviors under the environmental influence, we formulate
the residential DR as a contextual multi-armed bandit (MAB) problem, and the
online learning and selection (OLS) algorithm based on Thompson sampling is
proposed to solve it. This algorithm takes the contextual information into
consideration and is applicable to complicated DR settings. Numerical
simulations are performed to demonstrate the learning effectiveness of the
proposed algorithm.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms [23.61332577985059]
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior.
This paper introduces a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting.
arXiv Detail & Related papers (2024-02-23T15:49:46Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term
User Engagement [25.18963930580529]
We introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue.
AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories.
We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks.
arXiv Detail & Related papers (2023-10-06T02:45:21Z) - Online Learning for Incentive-Based Demand Response [0.0]
We consider the problem of learning online to manage Demand Response (DR) resources.
We propose an online learning scheme that employs least-squares for estimation with a perturbation to the reward price.
arXiv Detail & Related papers (2023-03-27T22:08:05Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Resource Planning for Hospitals Under Special Consideration of the
COVID-19 Pandemic: Optimization and Sensitivity Analysis [87.31348761201716]
Crises like the COVID-19 pandemic pose a serious challenge to health-care institutions.
BaBSim.Hospital is a tool for capacity planning based on discrete event simulation.
We aim to investigate and optimize these parameters to improve BaBSim.Hospital.
arXiv Detail & Related papers (2021-05-16T12:38:35Z) - Uncertainty-aware Remaining Useful Life predictor [57.74855412811814]
Remaining Useful Life (RUL) estimation is the problem of inferring how long a certain industrial asset can be expected to operate.
In this work, we consider Deep Gaussian Processes (DGPs) as possible solutions to the aforementioned limitations.
The performance of the algorithms is evaluated on the N-CMAPSS dataset from NASA for aircraft engines.
arXiv Detail & Related papers (2021-04-08T08:50:44Z) - Online Learning and Distributed Control for Residential Demand Response [16.61679791774638]
This paper studies the automated control method for regulating air conditioner (AC) loads in incentive-based residential demand response (DR)
We formulate the AC control problem in a DR event as a multi-period transition optimization that integrates the indoor thermal dynamics and customer opt-out status.
We propose an online DR control algorithm to learn customer behaviors and make real-time AC control schemes.
arXiv Detail & Related papers (2020-10-11T03:52:30Z) - Offline Learning for Planning: A Summary [0.0]
Training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment.
Data sets containing recorded experiences of intelligent agents performing various tasks are accessible on the internet.
In this paper we adumbrate the ideas motivating the development of the state-of-the-art offline learning baselines.
arXiv Detail & Related papers (2020-10-05T11:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.