Learning non-Markovian Decision-Making from State-only Sequences
- URL: http://arxiv.org/abs/2306.15156v3
- Date: Mon, 30 Oct 2023 06:18:02 GMT
- Title: Learning non-Markovian Decision-Making from State-only Sequences
- Authors: Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, Sirui Xie
- Abstract summary: We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP)
We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
- Score: 57.20193609153983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional imitation learning assumes access to the actions of
demonstrators, but these motor signals are often non-observable in naturalistic
settings. Additionally, sequential decision-making behaviors in these settings
can deviate from the assumptions of a standard Markov Decision Process (MDP).
To address these challenges, we explore deep generative modeling of state-only
sequences with non-Markov Decision Process (nMDP), where the policy is an
energy-based prior in the latent space of the state transition generator. We
develop maximum likelihood estimation to achieve model-based imitation, which
involves short-run MCMC sampling from the prior and importance sampling for the
posterior. The learned model enables \textit{decision-making as inference}:
model-free policy execution is equivalent to prior sampling, model-based
planning is posterior sampling initialized from the policy. We demonstrate the
efficacy of the proposed method in a prototypical path planning task with
non-Markovian constraints and show that the learned model exhibits strong
performances in challenging domains from the MuJoCo suite.
Related papers
- Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms [34.593772931446125]
monograph focuses on the exploration of various model-based and model-free approaches for Constrained within the context of average reward Markov Decision Processes (MDPs)
The primal-dual policy gradient-based algorithm is explored as a solution for constrained MDPs.
arXiv Detail & Related papers (2024-06-17T12:46:02Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Variational Latent Branching Model for Off-Policy Evaluation [23.073461349048834]
We propose a variational latent branching model (VLBM) to learn the transition function of Markov decision processes (MDPs)
We introduce the branching architecture to improve the model's robustness against randomly model weights.
We show that the VLBM outperforms existing state-of-the-art OPE methods in general.
arXiv Detail & Related papers (2023-01-28T02:20:03Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - Flow-based Recurrent Belief State Learning for POMDPs [20.860726518161204]
Partially Observable Markov Decision Process (POMDP) provides a principled and generic framework to model real world sequential decision making processes.
The main challenge lies in how to accurately obtain the belief state, which is the probability distribution over the unobservable environment states.
Recent advances in deep learning techniques show great potential to learn good belief states.
arXiv Detail & Related papers (2022-05-23T05:29:55Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.