Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal
Point Processes
- URL: http://arxiv.org/abs/2201.12569v1
- Date: Sat, 29 Jan 2022 11:53:40 GMT
- Title: Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal
Point Processes
- Authors: Chao Qu, Xiaoyu Tan, Siqiao Xue, Xiaoming Shi, James Zhang, Hongyuan
Mei
- Abstract summary: We consider a sequential decision making problem where the agent faces the environment characterized by discrete events.
This problem exists ubiquitously in social media, finance and health informatics but is rarely investigated by the conventional research in reinforcement learning.
We present a novel framework of model-based reinforcement learning where the agent's actions and observations are asynchronous discrete events occurring in continuous-time.
- Score: 8.710154439846816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a sequential decision making problem where the agent faces the
environment characterized by the stochastic discrete events and seeks an
optimal intervention policy such that its long-term reward is maximized. This
problem exists ubiquitously in social media, finance and health informatics but
is rarely investigated by the conventional research in reinforcement learning.
To this end, we present a novel framework of the model-based reinforcement
learning where the agent's actions and observations are asynchronous stochastic
discrete events occurring in continuous-time. We model the dynamics of the
environment by Hawkes process with external intervention control term and
develop an algorithm to embed such process in the Bellman equation which guides
the direction of the value gradient. We demonstrate the superiority of our
method in both synthetic simulator and real-world problem.
Related papers
- Fast Value Tracking for Deep Reinforcement Learning [7.648784748888187]
Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interact with their environment.
Existing algorithms often view these problem as static, focusing on point estimates for model parameters to maximize expected rewards.
Our research leverages the Kalman paradigm to introduce a novel quantification and sampling algorithm called Langevinized Kalman TemporalTD.
arXiv Detail & Related papers (2024-03-19T22:18:19Z) - Entropic Matching for Expectation Propagation of Markov Jump Processes [38.60042579423602]
We propose a new tractable inference scheme based on an entropic matching framework.
We demonstrate the effectiveness of our method by providing closed-form results for a simple family of approximate distributions.
We derive expressions for point estimation of the underlying parameters using an approximate expectation procedure.
arXiv Detail & Related papers (2023-09-27T12:07:21Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z) - Modeling of Spatio-Temporal Hawkes Processes with Randomized Kernels [15.556686221927501]
Inferring the dynamics of event processes hasly practical applications including crime prediction, and traffic forecasting.
We introduce on social-temporal Hawkes processes that are commonly used due to their capability to capture excitations between event occurrences.
We replace the spatial kernel calculations by randomized transformations and gradient descent to learn the process.
arXiv Detail & Related papers (2020-03-07T22:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.