Reinforcement Learning and Tree Search Methods for the Unit Commitment
Problem
- URL: http://arxiv.org/abs/2212.06001v1
- Date: Mon, 12 Dec 2022 16:03:31 GMT
- Title: Reinforcement Learning and Tree Search Methods for the Unit Commitment
Problem
- Authors: Patrick de Mars
- Abstract summary: Unit commitment problem determines operating schedules of generation units to meet demand.
Approaches which more rigorously account for uncertainty could yield large reductions in operating costs.
We develop guided tree search, a novel methodology combining model-free RL and model-based planning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The unit commitment (UC) problem, which determines operating schedules of
generation units to meet demand, is a fundamental task in power systems
operation. Existing UC methods using mixed-integer programming are not
well-suited to highly stochastic systems. Approaches which more rigorously
account for uncertainty could yield large reductions in operating costs by
reducing spinning reserve requirements; operating power stations at higher
efficiencies; and integrating greater volumes of variable renewables. A
promising approach to solving the UC problem is reinforcement learning (RL), a
methodology for optimal decision-making which has been used to conquer
long-standing grand challenges in artificial intelligence. This thesis explores
the application of RL to the UC problem and addresses challenges including
robustness under uncertainty; generalisability across multiple problem
instances; and scaling to larger power systems than previously studied. To
tackle these issues, we develop guided tree search, a novel methodology
combining model-free RL and model-based planning. The UC problem is formalised
as a Markov decision process and we develop an open-source environment based on
real data from Great Britain's power system to train RL agents. In problems of
up to 100 generators, guided tree search is shown to be competitive with
deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage
of RL is that the framework can be easily extended to incorporate
considerations important to power systems operators such as robustness to
generator failure, wind curtailment or carbon prices. When generator outages
are considered, guided tree search saves over 2\% in operating costs as
compared with methods using conventional $N-x$ reserve criteria.
Related papers
- Automated Heuristic Design for Unit Commitment Using Large Language Models [7.319412558420025]
Unit Commitment (UC) problem is a classic challenge in the optimal scheduling of power systems.<n>This paper proposes a Function Space Search (FunSearch) method based on large language models.<n>Results show that FunSearch performs better in terms of sampling time, evaluation time, and total operating cost of the system.
arXiv Detail & Related papers (2025-06-14T13:16:53Z) - Reinforced Informativeness Optimization for Long-Form Retrieval-Augmented Generation [77.10390725623125]
Long-form question answering (LFQA) presents unique challenges for large language models.<n>RioRAG is a novel reinforcement learning framework that advances long-form RAG through reinforced informativeness optimization.
arXiv Detail & Related papers (2025-05-27T07:34:41Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.
Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.
Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation [58.799397354312596]
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, particularly in system 1 tasks.<n>Recent research on System2-to-System1 methods surge, exploring the System 2 reasoning knowledge via inference-time computation.<n>In this paper, we focus on code generation, which is a representative System 2 task, and identify two primary challenges.
arXiv Detail & Related papers (2025-02-18T03:20:50Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.
Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Graph Attention-based Deep Reinforcement Learning for solving the
Chinese Postman Problem with Load-dependent costs [2.1212179660694104]
This paper proposes a novel DRL framework to address the Chinese Postman Problem ( CPP-LC) with load-dependent costs.
We introduce an autoregressive model based on DRL, namely ArcDRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively.
We also propose a new bio-inspired meta-heuristic solution based on Algorithm (EA) for CPP-LC.
arXiv Detail & Related papers (2023-10-24T04:50:32Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Stochastic Capacitated Arc Routing Problem [0.0]
This paper deals with the Capacitated Arc Routing Problem (SCARP), obtained by randomizing quantities on the arcs in the CARP.
For real-life problems, it is important to create solutions insensitive to variations of the quantities to collect because of the randomness of these quantities.
The results prove it is possible to obtain robust solutions without any significant enlargement of the solution cost.
arXiv Detail & Related papers (2022-11-23T06:39:17Z) - An Optimization Method-Assisted Ensemble Deep Reinforcement Learning
Algorithm to Solve Unit Commitment Problems [3.303380427144773]
Unit commitment is a fundamental problem in the day-ahead electricity market.
It is critical to solve UC problems efficiently.
Recent advances in artificial intelligence have demonstrated the capability of reinforcement learning to solve UC problems.
arXiv Detail & Related papers (2022-06-09T03:36:18Z) - Deep Reinforcement Learning Based Multidimensional Resource Management
for Energy Harvesting Cognitive NOMA Communications [64.1076645382049]
Combination of energy harvesting (EH), cognitive radio (CR), and non-orthogonal multiple access (NOMA) is a promising solution to improve energy efficiency.
In this paper, we study the spectrum, energy, and time resource management for deterministic-CR-NOMA IoT systems.
arXiv Detail & Related papers (2021-09-17T08:55:48Z) - Reducing the Deployment-Time Inference Control Costs of Deep
Reinforcement Learning Agents via an Asymmetric Architecture [6.824961837445515]
We propose an asymmetric architecture that reduces the overall inference costs via switching between a computationally expensive policy and an economic one.
Results show that our method is able to reduce the inference costs while retaining the agent's overall performance.
arXiv Detail & Related papers (2021-05-30T09:14:39Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.