Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large Deviations
- URL: http://arxiv.org/abs/2209.14089v2
- Date: Fri, 5 Apr 2024 08:44:22 GMT
- Title: Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large Deviations
- Authors: Edward Gillman, Dominic C. Rose, Juan P. Garrahan,
- Abstract summary: We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL)
We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimisation tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with large and factorisable state and action spaces. As an illustration of the applicability of ACTeN we solve the exponentially hard task of sampling rare trajectories in two paradigmatic stochastic models, the East model of glasses and the asymmetric simple exclusion process (ASEP), the latter being particularly challenging to other methods due to the absence of detailed balance. With substantial potential for further integration with the vast array of existing RL methods, the approach introduced here is promising both for applications in physics and to multi-agent RL problems more generally.
Related papers
- Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness [4.583289433858458]
We study how to efficiently apply reinforcement learning (RL) for solving large-scale optimization problems by leveraging intervention models.<n>We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization.
arXiv Detail & Related papers (2025-07-19T02:44:45Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - Latent Guided Sampling for Combinatorial Optimization [3.636090511738153]
Recent Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL)<n>While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms.<n>In this work, we propose LGS-Net, a novel latent space model that conditions on efficient problem instances, and introduce an efficient Neural inference method, Latent Guided Sampling (LGS)
arXiv Detail & Related papers (2025-06-04T08:02:59Z) - MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver [9.61561012521585]
This work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD)<n>The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, label-free training and effectively improving the model's generalization ability across diverse tasks.<n> Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks.
arXiv Detail & Related papers (2025-06-03T14:35:36Z) - Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning [3.0711362702464684]
We introduce a novel learning framework driven by Large Language Models (LLMs)<n>Unlike prevailing techniques that necessitate joint training with the neural network, our approach operates exclusively during the inference phase.<n>Our method enables a backbone model (trained on 100-node instances) to achieve superior performance on large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) of up to 100K nodes from diverse distributions.
arXiv Detail & Related papers (2025-06-03T03:15:22Z) - Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.
Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.
We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization [15.842155380912002]
This work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural optimization.
In particular, we design a powerful yet lightweight instance-conditioned Routing adaptation module for the NCO model.
We develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution.
arXiv Detail & Related papers (2024-05-03T08:00:19Z) - RL-I2IT: Image-to-Image Translation with Deep Reinforcement Learning [54.40719981158774]
Image-to-Image Translation (I2IT) methods generate images in a single run of a deep learning (DL) model.
We reformulate I2IT as a step-wise decision-making problem via deep reinforcement learning (DRL)
We propose a novel framework that performs RL-based I2IT (RL-I2IT)
arXiv Detail & Related papers (2023-09-24T15:40:40Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.