Related papers: Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints

Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints

URL: http://arxiv.org/abs/2501.12542v1
Date: Tue, 21 Jan 2025 23:16:19 GMT
Title: Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints
Authors: Siyuan Chen, Hanshen Yu, Jamal Yagoobi, Chenhui Shao,
Abstract summary: We propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in optimization problems.<n>Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time.
Score: 7.014163329716659
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing approaches to enforcing design constraints in Reinforcement Learning (RL) applications often rely on training-time penalties in the reward function or training/inference-time invalid action masking, but these methods either cannot be modified after training, or are limited in the types of constraints that can be implemented. To address this limitation, we propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in combinatorial optimization problems. This method respects flexible, inference-time constraints that support exclusion of invalid actions and forced inclusion of desired actions, and employs beam search to maximize sequence probability for more sensible constraint incorporation. RLCBS is extensible to RL-based planning and optimization problems that do not require real-time solution, and we apply the method to optimize process parameters for a novel modular testbed for paper drying. An RL agent is trained to minimize energy consumption across varying machine speed levels by generating optimal dryer module and air supply temperature configurations. Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time, while providing a 2.58-fold or higher speed improvement.

Related papers

Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees [10.177917426690701]
This paper presents a novel optimization theory-based safe deep reinforcement learning (DRL) framework for ultra-reliable Wireless Networked Control Systems (WNCSs)<n>The framework minimizes power consumption under key constraints, including Peak Age of Information (PAoI) violation probability, transmit power, and schedulability in the finite blocklength regime.<n>The proposed framework outperforms rule-based and other optimization theory based DRL benchmarks, achieving faster convergence, higher rewards, and greater stability.
arXiv Detail & Related papers (2025-07-11T14:57:37Z)
Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints [5.694070924765916]
Internet of Things systems increasingly operate in environments where devices must respond in real time while managing fluctuating resource constraints.<n>We propose a novel Budgeted Multi-Armed Bandit framework tailored for IoT applications with dynamic operational limits.<n>Our model introduces a decaying violation budget, which permits limited constraint violations early in the learning process and gradually enforces stricter compliance over time.
arXiv Detail & Related papers (2025-05-05T13:33:39Z)
Efficient Federated Split Learning for Large Language Models over Communication Networks [14.461758448289908]
Fine-tuning pre-trained large language models (LLM) in a distributed manner poses significant challenges on resource-constrained edge devices. We propose FedsLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques.
arXiv Detail & Related papers (2025-04-20T16:16:54Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching [0.0]
Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework.<n>We propose a novel framework that relies on switching between pure learning (reward) and constraint satisfaction.
arXiv Detail & Related papers (2024-10-10T15:19:45Z)
Constrained Reinforcement Learning for Safe Heat Pump Control [24.6591923448048]
We propose a novel building simulator I4B which provides interfaces for different usages. We apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.
arXiv Detail & Related papers (2024-09-29T14:15:13Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs) Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z)
A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch [0.0]
The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to fluctuations in dynamic prices, demand consumption, and renewable-based energy generation. By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' nature. We propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation.
arXiv Detail & Related papers (2023-07-26T17:12:04Z)
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL) We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.