Non-Markovian Reinforcement Learning using Fractional Dynamics
- URL: http://arxiv.org/abs/2107.13790v1
- Date: Thu, 29 Jul 2021 07:35:13 GMT
- Title: Non-Markovian Reinforcement Learning using Fractional Dynamics
- Authors: Gaurav Gupta, Chenzhong Yin, Jyotirmoy V. Deshmukh, Paul Bogdan
- Abstract summary: Reinforcement learning (RL) is a technique to learn the control policy for an agent that interacts with an environment.
In this paper, we propose a model-based RL technique for a system that has non-Markovian dynamics.
Such environments are common in many real-world applications such as in human physiology, biological systems, material science, and population dynamics.
- Score: 3.000697999889031
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) is a technique to learn the control policy for an
agent that interacts with a stochastic environment. In any given state, the
agent takes some action, and the environment determines the probability
distribution over the next state as well as gives the agent some reward. Most
RL algorithms typically assume that the environment satisfies Markov
assumptions (i.e. the probability distribution over the next state depends only
on the current state). In this paper, we propose a model-based RL technique for
a system that has non-Markovian dynamics. Such environments are common in many
real-world applications such as in human physiology, biological systems,
material science, and population dynamics. Model-based RL (MBRL) techniques
typically try to simultaneously learn a model of the environment from the data,
as well as try to identify an optimal policy for the learned model. We propose
a technique where the non-Markovianity of the system is modeled through a
fractional dynamical system. We show that we can quantify the difference in the
performance of an MBRL algorithm that uses bounded horizon model predictive
control from the optimal policy. Finally, we demonstrate our proposed framework
on a pharmacokinetic model of human blood glucose dynamics and show that our
fractional models can capture distant correlations on real-world datasets.
Related papers
- Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions.
We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Quantifying Multimodality in World Models [5.593667856320704]
We propose new metrics for the detection and quantification of multimodal uncertainty in RL based World Models.
The correct modelling & detection of uncertain future states lays the foundation for handling critical situations in a safe way.
arXiv Detail & Related papers (2021-12-14T09:52:18Z) - Physics-informed Dyna-Style Model-Based Deep Reinforcement Learning for
Dynamic Control [1.8275108630751844]
We propose to leverage the prior knowledge of underlying physics of the environment, where the governing laws are (partially) known.
By incorporating the prior information of the environment, the quality of the learned model can be notably improved.
arXiv Detail & Related papers (2021-07-31T02:19:36Z) - Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.