Enhancing Robotic Manipulation: Harnessing the Power of Multi-Task
Reinforcement Learning and Single Life Reinforcement Learning in Meta-World
- URL: http://arxiv.org/abs/2311.12854v1
- Date: Mon, 23 Oct 2023 06:35:44 GMT
- Title: Enhancing Robotic Manipulation: Harnessing the Power of Multi-Task
Reinforcement Learning and Single Life Reinforcement Learning in Meta-World
- Authors: Ghadi Nehme, Ishan Sabane, Tejas Y. Deo
- Abstract summary: This research project is to enable a robotic arm to execute seven distinct tasks within the Meta World environment.
A trained model will serve as a source of prior data for the single-life RL algorithm.
An ablation study demonstrates that MT-QWALE successfully completes tasks with a slightly larger number of steps even after hiding the final goal position.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: At present, robots typically require extensive training to successfully
accomplish a single task. However, to truly enhance their usefulness in
real-world scenarios, robots should possess the capability to perform multiple
tasks effectively. To address this need, various multi-task reinforcement
learning (RL) algorithms have been developed, including multi-task proximal
policy optimization (PPO), multi-task trust region policy optimization (TRPO),
and multi-task soft-actor critic (SAC). Nevertheless, these algorithms
demonstrate optimal performance only when operating within an environment or
observation space that exhibits a similar distribution. In reality, such
conditions are often not the norm, as robots may encounter scenarios or
observations that differ from those on which they were trained. Addressing this
challenge, algorithms like Q-Weighted Adversarial Learning (QWALE) attempt to
tackle the issue by training the base algorithm (generating prior data) solely
for a particular task, rendering it unsuitable for generalization across tasks.
So, the aim of this research project is to enable a robotic arm to successfully
execute seven distinct tasks within the Meta World environment. To achieve
this, a multi-task soft actor-critic (MT-SAC) is employed to train the robotic
arm. Subsequently, the trained model will serve as a source of prior data for
the single-life RL algorithm. The effectiveness of this MT-QWALE algorithm will
be assessed by conducting tests on various target positions (novel positions).
In the end, a comparison is provided between the trained MT-SAC and the
MT-QWALE algorithm where the MT-QWALE performs better. An ablation study
demonstrates that MT-QWALE successfully completes tasks with a slightly larger
number of steps even after hiding the final goal position.
Related papers
- Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.
Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions.
Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Discovering Unsupervised Behaviours from Full-State Trajectories [1.827510863075184]
We propose an analysis of Autonomous Robots Realising their Abilities; a Quality-Diversity algorithm that autonomously finds behavioural characterisations.
We evaluate this approach on a simulated robotic environment, where the robot has to autonomously discover its abilities from its full-state trajectories.
More specifically, the analysed approach autonomously finds policies that make the robot move to diverse positions, but also utilise its legs in diverse ways, and even perform half-rolls.
arXiv Detail & Related papers (2022-11-22T16:57:52Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Deep Reinforcement Learning with Adaptive Hierarchical Reward for
MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation [11.638614321552616]
Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method.
We develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives.
The proposed method is validated in a multi-objective manipulation task with a JACO robot arm.
arXiv Detail & Related papers (2022-05-26T15:44:31Z) - Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum
Learning Study [4.045850174820418]
This paper presents a study for accelerating robot learning of contact-rich manipulation tasks based on Curriculum Learning combined with Domain Randomization (DR)
We tackle complex industrial assembly tasks with position-controlled robots, such as insertion tasks.
Results also show that even when training only in simulation with toy tasks, our method can learn policies that can be transferred to the real-world robot.
arXiv Detail & Related papers (2022-04-27T11:08:39Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - Adaptable Automation with Modular Deep Reinforcement Learning and Policy
Transfer [8.299945169799795]
This article develops and tests a Hyper-Actor Soft Actor-Critic (HASAC) RL framework based on the notions of task modularization and transfer learning.
The HASAC framework is tested on a new virtual robotic manipulation benchmark, Meta-World.
Numerical experiments show superior performance by HASAC over state-of-the-art deep RL algorithms in terms of reward value, success rate, and task completion time.
arXiv Detail & Related papers (2020-11-27T03:09:05Z) - SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks [8.756012472587601]
Deep reinforcement learning (RL) can be used to learn complex manipulation tasks.
RL requires the robot to collect a large amount of real-world experience.
S SQUIRL performs a new but related long-horizon task robustly given only a single video demonstration.
arXiv Detail & Related papers (2020-03-10T20:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.