DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation
- URL: http://arxiv.org/abs/2410.11338v1
- Date: Tue, 15 Oct 2024 07:09:56 GMT
- Title: DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation
- Authors: Jaehyun Park, Yunho Kim, Sejin Kim, Byung-Jun Lee, Sundong Kim,
- Abstract summary: We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation framework.
We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making.
As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.
- Score: 10.645244994430483
- License:
- Abstract: We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework. We address two key challenges in offline RL: out-of-distribution samples and long-horizon problems. We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making. DIAR introduces an Adaptive Revaluation mechanism that dynamically adjusts decision lengths by comparing current and future state values, enabling flexible long-term decision-making. Furthermore, we address Q-value overestimation by combining Q-network learning with a value function guided by a diffusion model. The diffusion model generates diverse latent trajectories, enhancing policy robustness and generalization. As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.
Related papers
- Distributional Refinement Network: Distributional Forecasting via Deep Learning [0.8142555609235358]
A key task in actuarial modelling involves modelling the distributional properties of losses.
We propose a Distributional Refinement Network (DRN), which combines an inherently interpretable baseline model with a flexible neural network.
DRN captures varying effects of features across all quantiles, improving predictive performance while maintaining adequate interpretability.
arXiv Detail & Related papers (2024-06-03T05:14:32Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.
We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - POMDP inference and robust solution via deep reinforcement learning: An
application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL.
First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model.
The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z) - Environment Transformer and Policy Optimization for Model-Based Offline
Reinforcement Learning [25.684201757101267]
We propose an uncertainty-aware sequence modeling architecture called Environment Transformer.
Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL.
arXiv Detail & Related papers (2023-03-07T11:26:09Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL [39.58890668062184]
We frame the problem of tuning the rollout length as a meta-level sequential decision-making problem.
We use model-free deep reinforcement learning to solve the meta-level decision problem.
arXiv Detail & Related papers (2022-06-06T06:25:11Z) - Model-Free Voltage Regulation of Unbalanced Distribution Network Based
on Surrogate Model and Deep Reinforcement Learning [9.984416150031217]
This paper develops a model-free approach based on the surrogate model and deep reinforcement learning (DRL)
We have also extended it to deal with unbalanced three-phase scenarios.
arXiv Detail & Related papers (2020-06-24T18:49:41Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.