Related papers: Multifidelity Reinforcement Learning with Control Variates

Multifidelity Reinforcement Learning with Control Variates

URL: http://arxiv.org/abs/2206.05165v1
Date: Fri, 10 Jun 2022 15:01:37 GMT
Title: Multifidelity Reinforcement Learning with Control Variates
Authors: Sami Khairy, Prasanna Balaprakash
Abstract summary: In many computational science and engineering applications, the output of a system of interest corresponding to a given input can be queried at different levels of fidelity with different costs. We study the reinforcement learning problem in the presence of multiple environments with different levels of fidelity for a given control task. A multifidelity estimator that exploits the cross-correlations between the low- and high-fidelity returns is proposed to reduce the variance in the estimation of the state-action value function.
Score: 3.2895195535353317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many computational science and engineering applications, the output of a system of interest corresponding to a given input can be queried at different levels of fidelity with different costs. Typically, low-fidelity data is cheap and abundant, while high-fidelity data is expensive and scarce. In this work we study the reinforcement learning (RL) problem in the presence of multiple environments with different levels of fidelity for a given control task. We focus on improving the RL agent's performance with multifidelity data. Specifically, a multifidelity estimator that exploits the cross-correlations between the low- and high-fidelity returns is proposed to reduce the variance in the estimation of the state-action value function. The proposed estimator, which is based on the method of control variates, is used to design a multifidelity Monte Carlo RL (MFMCRL) algorithm that improves the learning of the agent in the high-fidelity environment. The impacts of variance reduction on policy evaluation and policy improvement are theoretically analyzed by using probability bounds. Our theoretical analysis and numerical experiments demonstrate that for a finite budget of high-fidelity data samples, our proposed MFMCRL agent attains superior performance compared with that of a standard RL agent that uses only the high-fidelity environment data for learning the optimal policy.

Related papers

Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization [0.0]
Multi-fidelity Reinforcement Learning (RL) frameworks efficiently utilize computational resources by integrating analysis models of varying accuracy and costs. This work proposes a novel adaptive multi-fidelity RL framework, in which multiple heterogeneous, non-hierarchical low-fidelity models are dynamically leveraged alongside a high-fidelity model. The effectiveness of the approach is demonstrated in an octocopter design optimization problem, utilizing two low-fidelity models alongside a high-fidelity simulator.
arXiv Detail & Related papers (2025-03-23T22:29:08Z)
Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents. In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches. We show that even moderate levels of information sharing can significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z)
Multi-Fidelity Policy Gradient Algorithms [23.62115512789292]
reinforcement learning algorithms require large amounts of data. Low-fidelity simulators can provide useful data for RL training, even if they are too coarse for direct sim-to-real transfer. We propose multi-fidelity policy gradients (Gs), an RL framework that mixes a small amount data from the target environment with a large volume of low-fidelity simulation data.
arXiv Detail & Related papers (2025-03-07T18:58:23Z)
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO) Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions. We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z)
Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
Multifidelity linear regression for scientific machine learning from scarce data [0.0]
We propose a new multifidelity training approach for scientific machine learning via linear regression. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data.
arXiv Detail & Related papers (2024-03-13T15:40:17Z)
Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling [19.60087366873302]
Multi-fidelity surrogate modeling aims to learn an accurate surrogate at the highest fidelity level. Deep learning approaches utilize neural network based encoders and decoders to improve scalability. We propose Multi-fidelity Residual Neural Processes (MFRNP), a novel multi-fidelity surrogate modeling framework.
arXiv Detail & Related papers (2024-02-29T04:40:25Z)
Multi-fidelity reinforcement learning framework for shape optimization [0.8258451067861933]
We introduce a controlled transfer learning framework that leverages a multi-fidelity simulation setting. Our strategy is deployed for an airfoil shape optimization problem at high Reynolds numbers. Our results demonstrate this framework's applicability to other scientific DRL scenarios.
arXiv Detail & Related papers (2022-02-22T20:44:04Z)
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z)
Balancing Value Underestimation and Overestimation with Realistic Actor-Critic [6.205681604290727]
This paper introduces a novel model-free algorithm, Realistic Actor-Critic(RAC), which can be incorporated with any off-policy RL algorithms to improve sample efficiency. RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn a policy family with the same neural network, each with different trade-offs between underestimation and overestimation. We evaluate RAC on the MuJoCo benchmark, achieving 10x sample efficiency and 25% performance improvement on the most challenging Humanoid environment compared to SAC.
arXiv Detail & Related papers (2021-10-19T03:35:01Z)
Adaptive Reliability Analysis for Multi-fidelity Models using a Collective Learning Strategy [6.368679897630892]
This study presents a new approach called adaptive multi-fidelity Gaussian process for reliability analysis (AMGPRA) It is shown that the proposed method achieves similar or higher accuracy with reduced computational costs compared to state-of-the-art single and multi-fidelity methods. A key application of AMGPRA is high-fidelity fragility modeling using complex and costly physics-based computational models.
arXiv Detail & Related papers (2021-09-21T14:42:58Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.