Provable Hierarchical Imitation Learning via EM
- URL: http://arxiv.org/abs/2010.03133v2
- Date: Sun, 14 Feb 2021 04:01:16 GMT
- Title: Provable Hierarchical Imitation Learning via EM
- Authors: Zhiyu Zhang, Ioannis Paschalidis
- Abstract summary: We consider learning an options-type hierarchical policy from expert demonstrations.
We characterize the EM approach proposed by Daniel et al.
We prove that the proposed algorithm converges with high probability to a norm ball around the true parameter.
- Score: 2.864550757598007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to recent empirical successes, the options framework for hierarchical
reinforcement learning is gaining increasing popularity. Rather than learning
from rewards which suffers from the curse of dimensionality, we consider
learning an options-type hierarchical policy from expert demonstrations. Such a
problem is referred to as hierarchical imitation learning. Converting this
problem to parameter inference in a latent variable model, we theoretically
characterize the EM approach proposed by Daniel et al. (2016). The population
level algorithm is analyzed as an intermediate step, which is nontrivial due to
the samples being correlated. If the expert policy can be parameterized by a
variant of the options framework, then under regularity conditions, we prove
that the proposed algorithm converges with high probability to a norm ball
around the true parameter. To our knowledge, this is the first performance
guarantee for an hierarchical imitation learning algorithm that only observes
primitive state-action pairs.
Related papers
- A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Online Baum-Welch algorithm for Hierarchical Imitation Learning [7.271970309320002]
We propose an online algorithm to perform hierarchical imitation learning in the options framework.
We show that this approach works well in both discrete and continuous environments.
arXiv Detail & Related papers (2021-03-22T22:03:25Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.