Plasma Shape Control via Zero-shot Generative Reinforcement Learning
- URL: http://arxiv.org/abs/2510.17531v1
- Date: Mon, 20 Oct 2025 13:34:51 GMT
- Title: Plasma Shape Control via Zero-shot Generative Reinforcement Learning
- Authors: Niannian Wu, Rongpeng Li, Zongyu Yang, Yong Xiao, Ning Wei, Yihang Chen, Bo Li, Zhifeng Zhao, Wulyu Zhong,
- Abstract summary: We develop a novel framework for developing a versatile, zero-shot control policy from a large-scale offline dataset of PID-controlled discharges.<n>The resulting foundation policy can be deployed for diverse trajectory tracking tasks in a zero-shot manner without any task-specific fine-tuning.
- Score: 17.3934551430283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional PID controllers have limited adaptability for plasma shape control, and task-specific reinforcement learning (RL) methods suffer from limited generalization and the need for repetitive retraining. To overcome these challenges, this paper proposes a novel framework for developing a versatile, zero-shot control policy from a large-scale offline dataset of historical PID-controlled discharges. Our approach synergistically combines Generative Adversarial Imitation Learning (GAIL) with Hilbert space representation learning to achieve dual objectives: mimicking the stable operational style of the PID data and constructing a geometrically structured latent space for efficient, goal-directed control. The resulting foundation policy can be deployed for diverse trajectory tracking tasks in a zero-shot manner without any task-specific fine-tuning. Evaluations on the HL-3 tokamak simulator demonstrate that the policy excels at precisely and stably tracking reference trajectories for key shape parameters across a range of plasma scenarios. This work presents a viable pathway toward developing highly flexible and data-efficient intelligent control systems for future fusion reactors.
Related papers
- Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation [8.7216199131049]
HeRD is a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation.<n>We employ a high-level reinforcement learning agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them.<n>Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation.
arXiv Detail & Related papers (2025-12-10T21:40:22Z) - TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z) - Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning [64.6334337560557]
Reinforcement learning via supervised learning (RvS) frames offline RL as a sequence modeling task.<n>Decision Transformer (DT) struggles to reliably align the actual achieved returns with specified target returns.<n>We propose Doctor, a novel approach that Double Checks the Transformer with target alignment for Offline RL.
arXiv Detail & Related papers (2025-08-22T14:30:53Z) - Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments.<n>We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets.<n>We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [61.145371212636505]
Reinforcement learning (RL) learns policies through trial and error, and optimal control, which plans actions using a learned or known dynamics model.<n>We systematically analyze the performance of different RL and control-based methods under datasets of varying quality.<n>Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Goal-Conditioned Predictive Coding for Offline Reinforcement Learning [24.300131097275298]
We investigate whether sequence modeling has the ability to condense trajectories into useful representations that enhance policy learning.
We introduce Goal-Conditioned Predictive Coding, a sequence modeling objective that yields powerful trajectory representations and leads to performant policies.
arXiv Detail & Related papers (2023-07-07T06:12:14Z) - RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion [16.800984476447624]
This paper presents a control framework that combines model-based optimal control and reinforcement learning.
We validate the robustness and controllability of the framework through a series of experiments.
Our framework effortlessly supports the training of control policies for robots with diverse dimensions.
arXiv Detail & Related papers (2023-05-29T01:33:55Z) - In-Distribution Barrier Functions: Self-Supervised Policy Filters that
Avoid Out-of-Distribution States [84.24300005271185]
We propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations.
Our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings.
arXiv Detail & Related papers (2023-01-27T22:28:19Z) - Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL)
We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller.
One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.