A Two-stage Fine-tuning Strategy for Generalizable Manipulation Skill of
Embodied AI
- URL: http://arxiv.org/abs/2307.11343v1
- Date: Fri, 21 Jul 2023 04:15:36 GMT
- Title: A Two-stage Fine-tuning Strategy for Generalizable Manipulation Skill of
Embodied AI
- Authors: Fang Gao, XueTao Li, Jun Yu, Feng Shaung
- Abstract summary: We propose a novel two-stage fine-tuning strategy to enhance the generalization capability of our model based on the Maniskill2 benchmark.
Our findings highlight the potential of our method to improve the generalization abilities of Embodied AI models and pave the way for their ractical applications in real-world scenarios.
- Score: 15.480968464853769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Chat-GPT has led to a surge of interest in Embodied AI.
However, many existing Embodied AI models heavily rely on massive interactions
with training environments, which may not be practical in real-world
situations. To this end, the Maniskill2 has introduced a full-physics
simulation benchmark for manipulating various 3D objects. This benchmark
enables agents to be trained using diverse datasets of demonstrations and
evaluates their ability to generalize to unseen scenarios in testing
environments. In this paper, we propose a novel two-stage fine-tuning strategy
that aims to further enhance the generalization capability of our model based
on the Maniskill2 benchmark. Through extensive experiments, we demonstrate the
effectiveness of our approach by achieving the 1st prize in all three tracks of
the ManiSkill2 Challenge. Our findings highlight the potential of our method to
improve the generalization abilities of Embodied AI models and pave the way for
their ractical applications in real-world scenarios. All codes and models of
our solution is available at https://github.com/xtli12/GXU-LIPE.git
Related papers
- Learning Generalizable 3D Manipulation With 10 Demonstrations [16.502781729164973]
We present a novel framework that learns manipulation skills from as few as 10 demonstrations.
We validate our framework through extensive experiments on both simulation benchmarks and real-world robotic systems.
This work shows significant potential for advancing efficient, generalizable manipulation skill learning in real-world applications.
arXiv Detail & Related papers (2024-11-15T14:01:02Z) - DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning [38.749045283035365]
We present DINO World Model (DINO-WM), a new method to model visual dynamics without reconstructing the visual world.
We evaluate DINO-WM across various domains, including maze navigation, tabletop pushing, and particle manipulation.
arXiv Detail & Related papers (2024-11-07T18:54:37Z) - Learning the Generalizable Manipulation Skills on Soft-body Tasks via Guided Self-attention Behavior Cloning Policy [9.345203561496552]
GP2E behavior cloning policy can guide the agent to learn the generalizable manipulation skills from soft-body tasks.
Our findings highlight the potential of our method to improve the generalization abilities of Embodied AI models.
arXiv Detail & Related papers (2024-10-08T07:31:10Z) - Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations [77.31328397965653]
We introduce Ag2Manip (Agent-Agnostic representations for Manipulation), a framework aimed at surmounting challenges through two key innovations.
A novel agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability.
An agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object.
arXiv Detail & Related papers (2024-04-26T16:40:17Z) - GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation [31.702907860448477]
GenH2R is a framework for learning generalizable vision-based human-to-robot (H2R) handover skills.
We acquire such generalizability by learning H2R handover at scale with a comprehensive solution.
We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation.
arXiv Detail & Related papers (2024-01-01T18:20:43Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - ProcTHOR: Large-Scale Embodied AI Using Procedural Generation [55.485985317538194]
ProcTHOR is a framework for procedural generation of Embodied AI environments.
We demonstrate state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation.
arXiv Detail & Related papers (2022-06-14T17:09:35Z) - Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and
Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track.
The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks.
For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative
Adversarial Nets [34.17829944466169]
Triple-GAIL is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose.
Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators.
arXiv Detail & Related papers (2020-05-19T03:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.