A Two-stage Fine-tuning Strategy for Generalizable Manipulation Skill of
Embodied AI
- URL: http://arxiv.org/abs/2307.11343v1
- Date: Fri, 21 Jul 2023 04:15:36 GMT
- Title: A Two-stage Fine-tuning Strategy for Generalizable Manipulation Skill of
Embodied AI
- Authors: Fang Gao, XueTao Li, Jun Yu, Feng Shaung
- Abstract summary: We propose a novel two-stage fine-tuning strategy to enhance the generalization capability of our model based on the Maniskill2 benchmark.
Our findings highlight the potential of our method to improve the generalization abilities of Embodied AI models and pave the way for their ractical applications in real-world scenarios.
- Score: 15.480968464853769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Chat-GPT has led to a surge of interest in Embodied AI.
However, many existing Embodied AI models heavily rely on massive interactions
with training environments, which may not be practical in real-world
situations. To this end, the Maniskill2 has introduced a full-physics
simulation benchmark for manipulating various 3D objects. This benchmark
enables agents to be trained using diverse datasets of demonstrations and
evaluates their ability to generalize to unseen scenarios in testing
environments. In this paper, we propose a novel two-stage fine-tuning strategy
that aims to further enhance the generalization capability of our model based
on the Maniskill2 benchmark. Through extensive experiments, we demonstrate the
effectiveness of our approach by achieving the 1st prize in all three tracks of
the ManiSkill2 Challenge. Our findings highlight the potential of our method to
improve the generalization abilities of Embodied AI models and pave the way for
their ractical applications in real-world scenarios. All codes and models of
our solution is available at https://github.com/xtli12/GXU-LIPE.git
Related papers
- Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations [77.31328397965653]
We introduce Ag2Manip (Agent-Agnostic representations for Manipulation), a framework aimed at surmounting challenges through two key innovations.
A novel agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability.
An agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object.
arXiv Detail & Related papers (2024-04-26T16:40:17Z) - Part-Guided 3D RL for Sim2Real Articulated Object Manipulation [27.422878372169805]
We propose a part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations.
We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training.
A single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation.
arXiv Detail & Related papers (2024-04-26T10:18:17Z) - Real Evaluations Tractability using Continuous Goal-Directed Actions in
Smart City Applications [3.1158660854608824]
Continuous Goal-Directed Actions (CGDA) encodes actions as changes of any feature that can be extracted from the environment.
Current strategies involve performing evaluations in a simulation, transferring the final joint trajectory to the actual robot.
Two different approaches to reduce the number of evaluations using EA, are proposed and compared.
arXiv Detail & Related papers (2024-02-01T15:38:21Z) - GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation [31.702907860448477]
GenH2R is a framework for learning generalizable vision-based human-to-robot (H2R) handover skills.
We acquire such generalizability by learning H2R handover at scale with a comprehensive solution.
We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation.
arXiv Detail & Related papers (2024-01-01T18:20:43Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Curriculum-Based Imitation of Versatile Skills [15.97723808124603]
Learning skills by imitation is a promising concept for the intuitive teaching of robots.
A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations.
Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways.
arXiv Detail & Related papers (2023-04-11T12:10:41Z) - ProcTHOR: Large-Scale Embodied AI Using Procedural Generation [55.485985317538194]
ProcTHOR is a framework for procedural generation of Embodied AI environments.
We demonstrate state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation.
arXiv Detail & Related papers (2022-06-14T17:09:35Z) - Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and
Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track.
The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks.
For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative
Adversarial Nets [34.17829944466169]
Triple-GAIL is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose.
Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators.
arXiv Detail & Related papers (2020-05-19T03:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.