Reinforcement Learning Constrained Beam Search for Parameter   Optimization of Paper Drying Under Flexible Constraints
        - URL: http://arxiv.org/abs/2501.12542v1
 - Date: Tue, 21 Jan 2025 23:16:19 GMT
 - Title: Reinforcement Learning Constrained Beam Search for Parameter   Optimization of Paper Drying Under Flexible Constraints
 - Authors: Siyuan Chen, Hanshen Yu, Jamal Yagoobi, Chenhui Shao, 
 - Abstract summary: We propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in optimization problems.<n>Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time.
 - Score: 7.014163329716659
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Existing approaches to enforcing design constraints in Reinforcement Learning (RL) applications often rely on training-time penalties in the reward function or training/inference-time invalid action masking, but these methods either cannot be modified after training, or are limited in the types of constraints that can be implemented. To address this limitation, we propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in combinatorial optimization problems. This method respects flexible, inference-time constraints that support exclusion of invalid actions and forced inclusion of desired actions, and employs beam search to maximize sequence probability for more sensible constraint incorporation. RLCBS is extensible to RL-based planning and optimization problems that do not require real-time solution, and we apply the method to optimize process parameters for a novel modular testbed for paper drying. An RL agent is trained to minimize energy consumption across varying machine speed levels by generating optimal dryer module and air supply temperature configurations. Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time, while providing a 2.58-fold or higher speed improvement. 
 
       
      
        Related papers
        - Safe Deep Reinforcement Learning for Resource Allocation with Peak Age   of Information Violation Guarantees [10.177917426690701]
This paper presents a novel optimization theory-based safe deep reinforcement learning (DRL) framework for ultra-reliable Wireless Networked Control Systems (WNCSs)<n>The framework minimizes power consumption under key constraints, including Peak Age of Information (PAoI) violation probability, transmit power, and schedulability in the finite blocklength regime.<n>The proposed framework outperforms rule-based and other optimization theory based DRL benchmarks, achieving faster convergence, higher rewards, and greater stability.
arXiv  Detail & Related papers  (2025-07-11T14:57:37Z) - Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource   Constraints [5.694070924765916]
Internet of Things systems increasingly operate in environments where devices must respond in real time while managing fluctuating resource constraints.<n>We propose a novel Budgeted Multi-Armed Bandit framework tailored for IoT applications with dynamic operational limits.<n>Our model introduces a decaying violation budget, which permits limited constraint violations early in the learning process and gradually enforces stricter compliance over time.
arXiv  Detail & Related papers  (2025-05-05T13:33:39Z) - Efficient Federated Split Learning for Large Language Models over   Communication Networks [14.461758448289908]
Fine-tuning pre-trained large language models (LLM) in a distributed manner poses significant challenges on resource-constrained edge devices.
We propose FedsLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques.
arXiv  Detail & Related papers  (2025-04-20T16:16:54Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.
We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv  Detail & Related papers  (2025-02-11T13:10:34Z) - Probabilistic Satisfaction of Temporal Logic Constraints in   Reinforcement Learning via Adaptive Policy-Switching [0.0]
Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework.<n>We propose a novel framework that relies on switching between pure learning (reward) and constraint satisfaction.
arXiv  Detail & Related papers  (2024-10-10T15:19:45Z) - Constrained Reinforcement Learning for Safe Heat Pump Control [24.6591923448048]
We propose a novel building simulator I4B which provides interfaces for different usages.
We apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem.
 Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.
arXiv  Detail & Related papers  (2024-09-29T14:15:13Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv  Detail & Related papers  (2023-12-28T18:28:23Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
  Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv  Detail & Related papers  (2023-10-18T06:07:10Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
  Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv  Detail & Related papers  (2023-10-13T21:26:16Z) - A Constraint Enforcement Deep Reinforcement Learning Framework for
  Optimal Energy Storage Systems Dispatch [0.0]
The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to fluctuations in dynamic prices, demand consumption, and renewable-based energy generation.
By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' nature.
We propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation.
arXiv  Detail & Related papers  (2023-07-26T17:12:04Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
  Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv  Detail & Related papers  (2022-02-14T01:31:46Z) - Combining Deep Learning and Optimization for Security-Constrained
  Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv  Detail & Related papers  (2020-07-14T12:38:21Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
  Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv  Detail & Related papers  (2020-02-22T10:15:53Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.