FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning -- A
Physics-Constrained Approach to Markov Decision Processes
- URL: http://arxiv.org/abs/2306.10407v1
- Date: Sat, 17 Jun 2023 18:28:03 GMT
- Title: FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning -- A
Physics-Constrained Approach to Markov Decision Processes
- Authors: Chengyang Huang and Siddhartha Srivastava and Xun Huan and Krishna
Garikipati
- Abstract summary: Inverse Reinforcement Learning (IRL) is a technique for revealing the rationale underlying the behavior of autonomous agents.
IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories.
We create a novel IRL algorithm, FP-IRL, which can simultaneously infer the transition and reward functions using only observed trajectories.
- Score: 0.5735035463793008
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Inverse Reinforcement Learning (IRL) is a compelling technique for revealing
the rationale underlying the behavior of autonomous agents. IRL seeks to
estimate the unknown reward function of a Markov decision process (MDP) from
observed agent trajectories. However, IRL needs a transition function, and most
algorithms assume it is known or can be estimated in advance from data. It
therefore becomes even more challenging when such transition dynamics is not
known a-priori, since it enters the estimation of the policy in addition to
determining the system's evolution. When the dynamics of these agents in the
state-action space is described by stochastic differential equations (SDE) in
It^{o} calculus, these transitions can be inferred from the mean-field theory
described by the Fokker-Planck (FP) equation. We conjecture there exists an
isomorphism between the time-discrete FP and MDP that extends beyond the
minimization of free energy (in FP) and maximization of the reward (in MDP). We
identify specific manifestations of this isomorphism and use them to create a
novel physics-aware IRL algorithm, FP-IRL, which can simultaneously infer the
transition and reward functions using only observed trajectories. We employ
variational system identification to infer the potential function in FP, which
consequently allows the evaluation of reward, transition, and policy by
leveraging the conjecture. We demonstrate the effectiveness of FP-IRL by
applying it to a synthetic benchmark and a biological problem of cancer cell
dynamics, where the transition function is inaccessible.
Related papers
- Quasi-potential and drift decomposition in stochastic systems by sparse identification [0.0]
The quasi-potential is a key concept in systems as it accounts for the long-term behavior of the dynamics of such systems.
This paper combines a sparse learning technique with action minimization methods in order to determine the quasi-potential.
We implement the proposed approach in 2- and 3-D systems, covering various types of potential landscapes and attractors.
arXiv Detail & Related papers (2024-09-10T22:02:15Z) - Sparse identification of quasipotentials via a combined data-driven method [4.599618895656792]
We leverage on machine learning via the combination of two data-driven techniques, namely a neural network and a sparse regression algorithm, to obtain symbolic expressions of quasipotential functions.
We show that our approach discovers a parsimonious quasipotential equation for an archetypal model with a known exact quasipotential and for the dynamics of a nanomechanical resonator.
arXiv Detail & Related papers (2024-07-06T11:27:52Z) - DeltaPhi: Learning Physical Trajectory Residual for PDE Solving [54.13671100638092]
We propose and formulate the Physical Trajectory Residual Learning (DeltaPhi)
We learn the surrogate model for the residual operator mapping based on existing neural operator networks.
We conclude that, compared to direct learning, physical residual learning is preferred for PDE solving.
arXiv Detail & Related papers (2024-06-14T07:45:07Z) - A Single Online Agent Can Efficiently Learn Mean Field Games [16.00164239349632]
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems.
This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
arXiv Detail & Related papers (2024-05-05T16:38:04Z) - Variational Sampling of Temporal Trajectories [39.22854981703244]
We introduce a mechanism to learn the distribution of trajectories by parameterizing the transition function $f$ explicitly as an element in a function space.
Our framework allows efficient synthesis of novel trajectories, while also directly providing a convenient tool for inference.
arXiv Detail & Related papers (2024-03-18T02:12:12Z) - Physics-Informed Solution of The Stationary Fokker-Plank Equation for a
Class of Nonlinear Dynamical Systems: An Evaluation Study [0.0]
An exact analytical solution of the Fokker-Planck (FP) equation is only available for a limited subset of dynamical systems.
To evaluate its potential, we present a data-free, physics-informed neural network (PINN) framework to solve the FP equation.
arXiv Detail & Related papers (2023-09-25T13:17:34Z) - Formal Controller Synthesis for Markov Jump Linear Systems with
Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems.
Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS.
We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z) - Self-Consistency of the Fokker-Planck Equation [117.17004717792344]
The Fokker-Planck equation governs the density evolution of the Ito process.
Ground-truth velocity field can be shown to be the solution of a fixed-point equation.
In this paper, we exploit this concept to design a potential function of the hypothesis velocity fields.
arXiv Detail & Related papers (2022-06-02T03:44:23Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.