Model-Based Imitation Learning Using Entropy Regularization of Model and
Policy
- URL: http://arxiv.org/abs/2206.10101v1
- Date: Tue, 21 Jun 2022 04:15:12 GMT
- Title: Model-Based Imitation Learning Using Entropy Regularization of Model and
Policy
- Authors: Eiji Uchibe
- Abstract summary: We propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process.
A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones.
Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.
- Score: 0.456877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approaches based on generative adversarial networks for imitation learning
are promising because they are sample efficient in terms of expert
demonstrations. However, training a generator requires many interactions with
the actual environment because model-free reinforcement learning is adopted to
update a policy. To improve the sample efficiency using model-based
reinforcement learning, we propose model-based Entropy-Regularized Imitation
Learning (MB-ERIL) under the entropy-regularized Markov decision process to
reduce the number of interactions with the actual environment. MB-ERIL uses two
discriminators. A policy discriminator distinguishes the actions generated by a
robot from expert ones, and a model discriminator distinguishes the
counterfactual state transitions generated by the model from the actual ones.
We derive the structured discriminators so that the learning of the policy and
the model is efficient. Computer simulations and real robot experiments show
that MB-ERIL achieves a competitive performance and significantly improves the
sample efficiency compared to baseline methods.
Related papers
- Model-based Policy Optimization using Symbolic World Model [46.42871544295734]
The application of learning-based control methods in robotics presents significant challenges.
One is that model-free reinforcement learning algorithms use observation data with low sample efficiency.
We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression.
arXiv Detail & Related papers (2024-07-18T13:49:21Z) - Revisiting Energy Based Models as Policies: Ranking Noise Contrastive
Estimation and Interpolating Energy Models [18.949193683555237]
In this work, we revisit the choice of energy-based models (EBM) as a policy class.
We develop a training objective and algorithm for energy models which combines several key ingredients.
We show that the Implicit Behavior Cloning (IBC) objective is actually biased even at the population level.
arXiv Detail & Related papers (2023-09-11T20:13:47Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent
Reinforcement Learning [15.12491397254381]
We propose an implicit model-based multi-agent reinforcement learning method based on value decomposition methods.
Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states.
arXiv Detail & Related papers (2022-04-20T12:16:27Z) - DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model.
We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Uncertainty-Aware Model-Based Reinforcement Learning with Application to
Autonomous Driving [2.3303341607459687]
We propose a novel uncertainty-aware model-based reinforcement learning framework, and then implement and validate it in autonomous driving.
The framework is developed based on the adaptive truncation approach, providing virtual interactions between the agent and environment model.
The developed algorithms are then implemented in end-to-end autonomous vehicle control tasks, validated and compared with state-of-the-art methods under various driving scenarios.
arXiv Detail & Related papers (2021-06-23T06:55:14Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Improving Robot Dual-System Motor Learning with Intrinsically Motivated
Meta-Control and Latent-Space Experience Imagination [17.356402088852423]
We present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions.
We evaluate our approach against baseline and state-of-the-art methods on learning vision-based robotic grasping in simulation and real world.
arXiv Detail & Related papers (2020-04-19T12:14:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.