Related papers: Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning

Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning

URL: http://arxiv.org/abs/2402.09387v1
Date: Wed, 14 Feb 2024 18:37:40 GMT
Title: Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning
Authors: Allen M. Wang, Oswin So, Charles Dawson, Darren T. Garnier, Cristina Rea, and Chuchu Fan
Abstract summary: We train a policy to safely ramp-down the plasma current while avoiding limits on a number of quantities correlated with disruptions. The trained policy is then successfully transferred to a higher fidelity simulator where it successfully ramps down the plasma while avoiding user-specified disruptive limits. As a library of trajectories is more interpretable and verifiable offline, we argue such an approach is a promising path for leveraging the capabilities of reinforcement learning.
Score: 11.143763372526747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The tokamak offers a promising path to fusion energy, but plasma disruptions pose a major economic risk, motivating considerable advances in disruption avoidance. This work develops a reinforcement learning approach to this problem by training a policy to safely ramp-down the plasma current while avoiding limits on a number of quantities correlated with disruptions. The policy training environment is a hybrid physics and machine learning model trained on simulations of the SPARC primary reference discharge (PRD) ramp-down, an upcoming burning plasma scenario which we use as a testbed. To address physics uncertainty and model inaccuracies, the simulation environment is massively parallelized on GPU with randomized physics parameters during policy training. The trained policy is then successfully transferred to a higher fidelity simulator where it successfully ramps down the plasma while avoiding user-specified disruptive limits. We also address the crucial issue of safety criticality by demonstrating that a constraint-conditioned policy can be used as a trajectory design assistant to design a library of feed-forward trajectories to handle different physics conditions and user settings. As a library of trajectories is more interpretable and verifiable offline, we argue such an approach is a promising path for leveraging the capabilities of reinforcement learning in the safety-critical context of burning plasma tokamaks. Finally, we demonstrate how the training environment can be a useful platform for other feed-forward optimization approaches by using an evolutionary algorithm to perform optimization of feed-forward trajectories that are robust to physics uncertainty

Related papers

Neural Fidelity Calibration for Informative Sim-to-Real Adaptation [10.117298045153564]
Deep reinforcement learning can seamlessly transfer agile locomotion and navigation skills from the simulator to real world. However, bridging the sim-to-real gap with domain randomization or adversarial methods often demands expert physics knowledge to ensure policy robustness. We propose Neural Fidelity (NFC), a novel framework that employs conditional score-based diffusion models to calibrate simulator physical coefficients and residual fidelity domains online during robot execution.
arXiv Detail & Related papers (2025-04-11T15:12:12Z)
Physics Enhanced Residual Policy Learning (PERPL) for safety cruising in mixed traffic platooning under actuator and communication delay [8.172286651098027]
Linear control models have gained extensive application in vehicle control due to their simplicity, ease of use, and support for stability analysis. Reinforcement learning (RL) models, on the other hand, offer adaptability but suffer from a lack of interpretability and generalization capabilities. This paper aims to develop a family of RL-based controllers enhanced by physics-informed policies.
arXiv Detail & Related papers (2024-09-23T23:02:34Z)
ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable [88.08120417169971]
Machine learning based autonomous driving systems often face challenges with safety-critical scenarios that are rare in real-world data. This work explores generating safety-critical driving scenarios by modifying complex real-world regular scenarios through trajectory optimization. Our approach addresses unrealistic diverging trajectories and unavoidable collision scenarios that are not useful for training robust planner.
arXiv Detail & Related papers (2024-09-12T08:26:33Z)
Residual Physics Learning and System Identification for Sim-to-real Transfer of Policies on Buoyancy Assisted Legged Robots [14.760426243769308]
In this work, we demonstrate robust sim-to-real transfer of control policies on the BALLU robots via system identification. Rather than relying on standard supervised learning formulations, we utilize deep reinforcement learning to train an external force policy. We analyze the improved simulation fidelity by comparing the simulation trajectories against the real-world ones.
arXiv Detail & Related papers (2023-03-16T18:49:05Z)
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning [25.684201757101267]
We propose an uncertainty-aware sequence modeling architecture called Environment Transformer. Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL.
arXiv Detail & Related papers (2023-03-07T11:26:09Z)
Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model [42.28001762749647]
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. We consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties.
arXiv Detail & Related papers (2022-03-13T06:37:25Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)
Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance [110.63037190641414]
We propose to learn congestion patterns explicitly and devise a novel "Sense--Learn--Reason--Predict" framework. By decomposing the learning phases into two stages, a "student" can learn contextual cues from a "teacher" while generating collision-free trajectories. In experiments, we demonstrate that the proposed model is able to generate collision-free trajectory predictions in a synthetic dataset.
arXiv Detail & Related papers (2021-03-26T02:42:33Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm. The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)
Multiplicative Controller Fusion: Leveraging Algorithmic Priors for Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer [18.50206483493784]
We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions. During training, our gated fusion approach enables the prior to guide the initial stages of exploration. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation.
arXiv Detail & Related papers (2020-03-11T05:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.