Reinforced Imitation Learning by Free Energy Principle
- URL: http://arxiv.org/abs/2107.11811v1
- Date: Sun, 25 Jul 2021 14:19:29 GMT
- Title: Reinforced Imitation Learning by Free Energy Principle
- Authors: Ryoya Ogishima, Izumi Karino, Yasuo Kuniyoshi
- Abstract summary: Reinforcement Learning (RL) requires a large amount of exploration especially in sparse-reward settings.
Imitation Learning (IL) can learn from expert demonstrations without exploration.
We radically unify RL and IL based on Free Energy Principle (FEP)
- Score: 2.9327503320877457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) requires a large amount of exploration especially
in sparse-reward settings. Imitation Learning (IL) can learn from expert
demonstrations without exploration, but it never exceeds the expert's
performance and is also vulnerable to distributional shift between
demonstration and execution. In this paper, we radically unify RL and IL based
on Free Energy Principle (FEP). FEP is a unified Bayesian theory of the brain
that explains perception, action and model learning by a common fundamental
principle. We present a theoretical extension of FEP and derive an algorithm in
which an agent learns the world model that internalizes expert demonstrations
and at the same time uses the model to infer the current and future states and
actions that maximize rewards. The algorithm thus reduces exploration costs by
partially imitating experts as well as maximizing its return in a seamless way,
resulting in a higher performance than the suboptimal expert. Our experimental
results show that this approach is promising in visual control tasks especially
in sparse-reward environments.
Related papers
- Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - Imitation Learning from Observation through Optimal Transport [25.398983671932154]
Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert.
We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning.
We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting.
arXiv Detail & Related papers (2023-10-02T20:53:20Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - A Free Lunch from the Noise: Provable and Practical Exploration for
Representation Learning [55.048010996144036]
We show that under some noise assumption, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free.
We propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise.
arXiv Detail & Related papers (2021-11-22T19:24:57Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z) - Energy-Based Imitation Learning [29.55675131809474]
We tackle a common scenario in imitation learning (IL) where agents try to recover the optimal policy from expert demonstrations.
Inspired by recent progress in energy-based model (EBM), in this paper we propose a simplified IL framework named Energy-Based Imitation Learning (EBIL)
EBIL combines the idea of both EBM and occupancy measure matching, and via theoretic analysis we reveal that EBIL and Max-Entropy IRL (MaxEnt IRL) approaches are two sides of the same coin.
arXiv Detail & Related papers (2020-04-20T15:49:35Z) - Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches.
We develop and implement a novel objective for decision making, which we term the free energy of the expected future.
We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.