Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction
- URL: http://arxiv.org/abs/2507.22640v1
- Date: Wed, 30 Jul 2025 12:58:02 GMT
- Title: Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction
- Authors: Alex Durkin, Jasper Stolte, Matthew Jones, Raghuraman Pitchumani, Bei Li, Christian Michler, Mehmet Mercangöz,
- Abstract summary: offline reinforcement learning (offline RL) offers a promising framework for developing control strategies in chemical process systems.<n>This work investigates the application of offline RL to the safe and efficient control of an exothermic polymerisation continuous stirred-tank reactor.
- Score: 9.509828265491064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (offline RL) offers a promising framework for developing control strategies in chemical process systems using historical data, without the risks or costs of online experimentation. This work investigates the application of offline RL to the safe and efficient control of an exothermic polymerisation continuous stirred-tank reactor. We introduce a Gymnasium-compatible simulation environment that captures the reactor's nonlinear dynamics, including reaction kinetics, energy balances, and operational constraints. The environment supports three industrially relevant scenarios: startup, grade change down, and grade change up. It also includes reproducible offline datasets generated from proportional-integral controllers with randomised tunings, providing a benchmark for evaluating offline RL algorithms in realistic process control tasks. We assess behaviour cloning and implicit Q-learning as baseline algorithms, highlighting the challenges offline agents face, including steady-state offsets and degraded performance near setpoints. To address these issues, we propose a novel deployment-time safety layer that performs gradient-based action correction using input convex neural networks (PICNNs) as learned cost models. The PICNN enables real-time, differentiable correction of policy actions by descending a convex, state-conditioned cost surface, without requiring retraining or environment interaction. Experimental results show that offline RL, particularly when combined with convex action correction, can outperform traditional control approaches and maintain stability across all scenarios. These findings demonstrate the feasibility of integrating offline RL with interpretable and safety-aware corrections for high-stakes chemical process control, and lay the groundwork for more reliable data-driven automation in industrial systems.
Related papers
- Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap.<n>We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z) - CHEQ-ing the Box: Safe Variable Impedance Learning for Robotic Polishing [5.467140383171385]
This work presents an experimental demonstration of the hybrid RL algorithm CHEQ for robotic polishing with variable impedance.<n>On hardware, CHEQ achieves effective polishing behaviour, requiring only eight hours of training and incurring just five failures.<n>Results highlight the potential of adaptive hybrid RL for real-world, contact-rich tasks trained directly on hardware.
arXiv Detail & Related papers (2025-01-14T10:13:41Z) - CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning [38.63187494867502]
CtRL-Sim is a method that leverages return-conditioned offline reinforcement learning (RL) to efficiently generate reactive and controllable traffic agents.
We show that CtRL-Sim can generate realistic safety-critical scenarios while providing fine-grained control over agent behaviours.
arXiv Detail & Related papers (2024-03-29T02:10:19Z) - Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Environment Transformer and Policy Optimization for Model-Based Offline
Reinforcement Learning [25.684201757101267]
We propose an uncertainty-aware sequence modeling architecture called Environment Transformer.
Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL.
arXiv Detail & Related papers (2023-03-07T11:26:09Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.