End-Effect Exploration Drive for Effective Motor Learning
- URL: http://arxiv.org/abs/2006.15960v2
- Date: Mon, 5 Oct 2020 14:43:05 GMT
- Title: End-Effect Exploration Drive for Effective Motor Learning
- Authors: Emmanuel Dauc\'e
- Abstract summary: Key objective in reinforcement learning is to invert a target distribution of effects.
End-effect drives are proposed as an effective way to implement goal-directed motor learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stemming on the idea that a key objective in reinforcement learning is to
invert a target distribution of effects, end-effect drives are proposed as an
effective way to implement goal-directed motor learning, in the absence of an
explicit forward model. An end-effect model relies on a simple statistical
recording of the effect of the current policy, here used as a substitute for
the more resource-demanding forward models. When combined with a reward
structure, it forms the core of a lightweight variational free energy
minimization setup. The main difficulty lies in the maintenance of this
simplified effect model together with the online update of the policy. When the
prior target distribution is uniform, it provides a ways to learn an efficient
exploration policy, consistently with the intrinsic curiosity principles. When
combined with an extrinsic reward, our approach is finally shown to provide a
faster training than traditional off-policy techniques.
Related papers
- Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Implicit Training of Energy Model for Structure Prediction [14.360826930970765]
In this work, we argue that the existing inference network based structure prediction methods are indirectly learning to optimize a dynamic loss objective parameterized by the energy model.
We then explore using implicit-gradient based technique to learn the corresponding dynamic objectives.
arXiv Detail & Related papers (2022-11-21T17:08:44Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Imaginary Hindsight Experience Replay: Curious Model-based Learning for
Sparse Reward Tasks [9.078290260836706]
We propose a model-based method tailored for sparse-reward tasks that foregoes the need for complicated reward engineering.
This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates.
Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
arXiv Detail & Related papers (2021-10-05T23:38:31Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with
Deep Reinforcement Learning [42.525696463089794]
Model Predictive Actor-Critic (MoPAC) is a hybrid model-based/model-free method that combines model predictive rollouts with policy optimization as to mitigate model bias.
MoPAC guarantees optimal skill learning up to an approximation error and reduces necessary physical interaction with the environment.
arXiv Detail & Related papers (2021-03-25T13:50:24Z) - Model-free and Bayesian Ensembling Model-based Deep Reinforcement
Learning for Particle Accelerator Control Demonstrated on the FERMI FEL [0.0]
This paper shows how reinforcement learning can be used on an operational level on accelerator physics problems.
We compare purely model-based to model-free reinforcement learning applied to the intensity optimisation on the FERMI FEL system.
We find that the model-based approach demonstrates higher representational power and sample-efficiency, while the performance of the model-free method is slightly superior.
arXiv Detail & Related papers (2020-12-17T16:57:27Z) - Bridging Imagination and Reality for Model-Based Deep Reinforcement
Learning [72.18725551199842]
We propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD)
It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
arXiv Detail & Related papers (2020-10-23T03:22:01Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.