Reward Conditioned Neural Movement Primitives for Population Based
Variational Policy Optimization
- URL: http://arxiv.org/abs/2011.04282v1
- Date: Mon, 9 Nov 2020 09:53:37 GMT
- Title: Reward Conditioned Neural Movement Primitives for Population Based
Variational Policy Optimization
- Authors: M.Tuluhan Akbulut, Utku Bozdogan, Ahmet Tekden and Emre Ugur
- Abstract summary: This paper studies the reward based policy exploration problem in a supervised learning approach.
We show that our method provides stable learning progress and significant sample efficiency compared to a number of state-of-the-art robotic reinforcement learning methods.
- Score: 4.559353193715442
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of this paper is to study the reward based policy exploration problem
in a supervised learning approach and enable robots to form complex movement
trajectories in challenging reward settings and search spaces. For this, the
experience of the robot, which can be bootstrapped from demonstrated
trajectories, is used to train a novel Neural Processes-based deep network that
samples from its latent space and generates the required trajectories given
desired rewards. Our framework can generate progressively improved trajectories
by sampling them from high reward landscapes, increasing the reward gradually.
Variational inference is used to create a stochastic latent space to sample
varying trajectories in generating population of trajectories given target
rewards. We benefit from Evolutionary Strategies and propose a novel crossover
operation, which is applied in the self-organized latent space of the
individual policies, allowing blending of the individuals that might address
different factors in the reward function. Using a number of tasks that require
sequential reaching to multiple points or passing through gaps between objects,
we showed that our method provides stable learning progress and significant
sample efficiency compared to a number of state-of-the-art robotic
reinforcement learning methods. Finally, we show the real-world suitability of
our method through real robot execution involving obstacle avoidance.
Related papers
- Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable.
Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration.
We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Evolutionary Swarm Robotics: Dynamic Subgoal-Based Path Formation and
Task Allocation for Exploration and Navigation in Unknown Environments [0.0]
The paper presents a method called the sub-goal-based path formation, which establishes a path between two different locations by exploiting visually connected sub-goals.
The paper tackles the problem of inter-collision (traffic) among a large number of robots engaged in path formation, which negatively impacts the performance of the sub-goal-based method.
A task allocation strategy is proposed, leveraging local communication protocols and light signal-based communication.
arXiv Detail & Related papers (2023-12-27T15:13:56Z) - Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Continuous Trajectory Generation Based on Two-Stage GAN [50.55181727145379]
We propose a novel two-stage generative adversarial framework to generate the continuous trajectory on the road network.
Specifically, we build the generator under the human mobility hypothesis of the A* algorithm to learn the human mobility behavior.
For the discriminator, we combine the sequential reward with the mobility yaw reward to enhance the effectiveness of the generator.
arXiv Detail & Related papers (2023-01-16T09:54:02Z) - Reaching Through Latent Space: From Joint Statistics to Path Planning in
Manipulation [26.38185646091712]
We present a novel approach to path planning for robotic manipulators.
Paths are produced via iterative optimisation in the latent space of a generative model of robot poses.
Our models are trained in a task-agnostic manner on randomly sampled robot poses.
arXiv Detail & Related papers (2022-10-21T07:25:21Z) - E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories [0.0]
We introduce a new NS-based method that can generate large datasets of grasping trajectories in a platform-agnostic manner.
Inspired by the hierarchical learning paradigm, our method decouples approach and prehension to make the behavioral space smoother.
Experiments conducted on 3 different robot-gripper setups and on several standard objects shows that our method outperforms state-of-the-art.
arXiv Detail & Related papers (2022-10-14T15:13:10Z) - Diversity-based Trajectory and Goal Selection with Hindsight Experience
Replay [8.259694128526112]
We propose diversity-based trajectory and goal selection with HER (DTGSH)
We show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.
arXiv Detail & Related papers (2021-08-17T21:34:24Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.