Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation
- URL: http://arxiv.org/abs/2308.16471v3
- Date: Thu, 2 May 2024 07:30:24 GMT
- Title: Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation
- Authors: Satoshi Yamamori, Jun Morimoto,
- Abstract summary: We propose a multitask reinforcement learning algorithm for foundational policy acquisition to generate novel motor skills.
Inspired by human sensorimotor adaptation mechanisms, we aim to train encoder-decoder networks that can be commonly used to learn novel motor skills.
- Score: 0.9668407688201356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we propose a multitask reinforcement learning algorithm for foundational policy acquisition to generate novel motor skills. Inspired by human sensorimotor adaptation mechanisms, we aim to train encoder-decoder networks that can be commonly used to learn novel motor skills in a single movement category. To train the policy network, we develop the multitask reinforcement learning method, where the policy needs to cope with changes in goals or environments with different reward functions or physical parameters of the environment in dynamic movement generation tasks. Here, as a concrete task, we evaluated the proposed method with the ball heading task using a monopod robot model. The results showed that the proposed method could adapt to novel target positions or inexperienced ball restitution coefficients. Furthermore, we demonstrated that the acquired foundational policy network originally learned for heading motion, can be used to generate an entirely new overhead kicking skill.
Related papers
- Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning [85.91908329457081]
Multi-task reinforcement learning employs a single policy to complete various tasks, aiming to develop an agent with generalizability across different scenarios.
Existing approaches typically use a routing network to generate specific routes for each task and reconstruct a set of modules into diverse models to complete multiple tasks simultaneously.
We propose a Model Evolution framework with Genetic Algorithm (MEGA), which enables the model to evolve during training according to the difficulty of the tasks.
arXiv Detail & Related papers (2025-02-19T09:22:34Z) - Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning [14.68673479535835]
We propose a hierarchical vision-based reward shaping method to guide the policy of reinforcement learning to align with human common sense.
To help the policy adapt to uncertainty and changes in long-horizon tasks, the top layer features an adaptive skill selection module.
Our method achieves a higher win rate and effectively aligns the policy with human common sense.
arXiv Detail & Related papers (2025-02-19T05:04:10Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - A Central Motor System Inspired Pre-training Reinforcement Learning for Robotic Control [7.227887302864789]
We propose CMS-PRL, a pre-training reinforcement learning method inspired by the Central Motor System.
First, we introduce a fusion reward mechanism that combines the basic motor reward with mutual information reward.
Second, we design a skill encoding method inspired by the motor program of the basal ganglia, providing rich and continuous skill instructions.
Third, we propose a skill activity function to regulate motor skill activity, enabling the generation of skills with different activity levels.
arXiv Detail & Related papers (2023-11-14T00:49:12Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning [2.1408617023874443]
We propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient.
The proposed method works in both continuous and discrete action spaces with no design change of the policy network.
arXiv Detail & Related papers (2023-03-15T20:07:48Z) - Learning Multi-Task Transferable Rewards via Variational Inverse
Reinforcement Learning [10.782043595405831]
We extend an empowerment-based regularization technique to situations with multiple tasks based on the framework of a generative adversarial network.
Under the multitask environments with unknown dynamics, we focus on learning a reward and policy from unlabeled expert examples.
Our proposed method derives the variational lower bound of the situational mutual information to optimize it.
arXiv Detail & Related papers (2022-06-19T22:32:41Z) - Deep Reinforcement Learning with Adaptive Hierarchical Reward for
MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation [11.638614321552616]
Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method.
We develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives.
The proposed method is validated in a multi-objective manipulation task with a JACO robot arm.
arXiv Detail & Related papers (2022-05-26T15:44:31Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Towards Exploiting Geometry and Time for FastOff-Distribution Adaptation
in Multi-Task RobotLearning [17.903462188570067]
We train policies for a base set of pre-training tasks, then experiment with adapting to new off-distribution tasks.
We find that combining low-complexity target policy classes, base policies as black-box priors, and simple optimization algorithms allows us to acquire new tasks outside the base task distribution.
arXiv Detail & Related papers (2021-06-24T02:13:50Z) - An Open-Source Multi-Goal Reinforcement Learning Environment for Robotic
Manipulation with Pybullet [38.8947981067233]
This work re-implements the OpenAI Gym multi-goal robotic manipulation environment, originally based on the commercial Mujoco engine, onto the open-source Pybullet engine.
We provide users with new APIs to access a joint control mode, image observations and goals with customisable camera and a built-in on-hand camera.
We also design a set of multi-step, multi-goal, long-horizon and sparse reward robotic manipulation tasks, aiming to inspire new goal-conditioned reinforcement learning algorithms for such challenges.
arXiv Detail & Related papers (2021-05-12T21:58:57Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Towards Coordinated Robot Motions: End-to-End Learning of Motion
Policies on Transform Trees [63.31965375413414]
We propose to solve multi-task problems through learning structured policies from human demonstrations.
Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces.
We derive an end-to-end learning objective function that is suitable for the multi-task problem.
arXiv Detail & Related papers (2020-12-24T22:46:22Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.