Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control
- URL: http://arxiv.org/abs/2601.06540v1
- Date: Sat, 10 Jan 2026 11:43:15 GMT
- Title: Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODASER) for Safe Reinforcement Learning in Optimal Control
- Authors: Roya Khalili Amirabadi, Mohsen Jalaeian Farimani, Omid Solaymani Fard,
- Abstract summary: This paper proposes a novel reinforcement learning framework, named Self-Organizing Dual-Replay Adaptive Clustering Experience (SODACER)<n>SODACER is designed to achieve safe and scalable optimal control of nonlinear systems.<n>The proposed approach is validated on a nonlinear Human Papillomavirus (HPV) transmission model with multiple control inputs and safety constraints.
- Score: 2.8037951156321372
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper proposes a novel reinforcement learning framework, named Self-Organizing Dual-buffer Adaptive Clustering Experience Replay (SODACER), designed to achieve safe and scalable optimal control of nonlinear systems. The proposed SODACER mechanism consisting of a Fast-Buffer for rapid adaptation to recent experiences and a Slow-Buffer equipped with a self-organizing adaptive clustering mechanism to maintain diverse and non-redundant historical experiences. The adaptive clustering mechanism dynamically prunes redundant samples, optimizing memory efficiency while retaining critical environmental patterns. The approach integrates SODASER with Control Barrier Functions (CBFs) to guarantee safety by enforcing state and input constraints throughout the learning process. To enhance convergence and stability, the framework is combined with the Sophia optimizer, enabling adaptive second-order gradient updates. The proposed SODACER-Sophia's architecture ensures reliable, effective, and robust learning in dynamic, safety-critical environments, offering a generalizable solution for applications in robotics, healthcare, and large-scale system optimization. The proposed approach is validated on a nonlinear Human Papillomavirus (HPV) transmission model with multiple control inputs and safety constraints. Comparative evaluations against random and clustering-based experience replay methods demonstrate that SODACER achieves faster convergence, improved sample efficiency, and a superior bias-variance trade-off, while maintaining safe system trajectories, validated via the Friedman test.
Related papers
- Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models [57.006252510102506]
Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but often lacks provable guarantees for safety-critical applications.<n>We introduce a novel recovery-based shielding framework that enables safe RL with a provable safety lower bound for unknown and non-linear continuous dynamical systems.
arXiv Detail & Related papers (2026-02-12T22:03:35Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - Adaptive Reinforcement Learning for Dynamic Configuration Allocation in Pre-Production Testing [4.370892281528124]
We introduce a novel reinforcement learning framework that recasts configuration allocation as a sequential decision-making problem.<n>Our method is the first to integrate Q-learning with a hybrid reward design that fuses simulated outcomes and real-time feedback.
arXiv Detail & Related papers (2025-10-02T05:12:28Z) - Rethinking Safety in LLM Fine-tuning: An Optimization Perspective [56.31306558218838]
We show that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts.<n>We propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance.<n>Our experiments on the Llama families across multiple datasets demonstrate that safety problems can largely be avoided without specialized interventions.
arXiv Detail & Related papers (2025-08-17T23:46:36Z) - Optimal Parameter Adaptation for Safety-Critical Control via Safe Barrier Bayesian Optimization [27.36423499121502]
Control barrier function (CBF) method, a promising solution for safety-critical control, poses a new challenge of enhancing control performance.<n>We propose a novel framework combining the CBF method with Bayesian optimization (BO) to optimize the safe control performance.
arXiv Detail & Related papers (2025-03-25T04:56:17Z) - Learning-Enhanced Safeguard Control for High-Relative-Degree Systems: Robust Optimization under Disturbances and Faults [6.535600892275023]
This paper aims to enhance system performance with safety guarantee in reinforcement learning-based optimal control problems.<n>The concept of gradient similarity is proposed to quantify the relationship between the gradient of safety and the gradient of performance.<n> gradient manipulation and adaptive mechanisms are introduced in the safe RL framework to enhance the performance with a safety guarantee.
arXiv Detail & Related papers (2025-01-26T03:03:02Z) - FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning.
This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees.
We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z) - Enhancing Security in Federated Learning through Adaptive
Consensus-Based Model Update Validation [2.28438857884398]
This paper introduces an advanced approach for fortifying Federated Learning (FL) systems against label-flipping attacks.
We propose a consensus-based verification process integrated with an adaptive thresholding mechanism.
Our results indicate a significant mitigation of label-flipping attacks, bolstering the FL system's resilience.
arXiv Detail & Related papers (2024-03-05T20:54:56Z) - Real-Time Adaptive Safety-Critical Control with Gaussian Processes in
High-Order Uncertain Models [14.790031018404942]
This paper presents an adaptive online learning framework for systems with uncertain parameters.
We first integrate a forgetting factor to refine a variational sparse GP algorithm.
In the second phase, we propose a safety filter based on high-order control barrier functions.
arXiv Detail & Related papers (2024-02-29T08:25:32Z) - A Policy Optimization Method Towards Optimal-time Stability [15.722871779526526]
We propose a policy optimization technique incorporating sampling-based Lyapunov stability.
Our approach enables the system's state to reach an equilibrium point within an optimal time.
arXiv Detail & Related papers (2023-01-02T04:19:56Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Joint Differentiable Optimization and Verification for Certified
Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties.
We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z) - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning [80.05727171757454]
We treat safe optimization-based control strategies as experts in an imitation learning problem.
We train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert.
arXiv Detail & Related papers (2021-02-18T05:11:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.