ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
- URL: http://arxiv.org/abs/2206.06994v1
- Date: Tue, 14 Jun 2022 17:09:35 GMT
- Title: ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
- Authors: Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi
Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha
Kembhavi, Roozbeh Mottaghi
- Abstract summary: ProcTHOR is a framework for procedural generation of Embodied AI environments.
We demonstrate state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation.
- Score: 55.485985317538194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Massive datasets and high-capacity models have driven many recent
advancements in computer vision and natural language understanding. This work
presents a platform to enable similar success stories in Embodied AI. We
propose ProcTHOR, a framework for procedural generation of Embodied AI
environments. ProcTHOR enables us to sample arbitrarily large datasets of
diverse, interactive, customizable, and performant virtual environments to
train and evaluate embodied agents across navigation, interaction, and
manipulation tasks. We demonstrate the power and potential of ProcTHOR via a
sample of 10,000 generated houses and a simple neural model. Models trained
using only RGB images on ProcTHOR, with no explicit mapping and no human task
supervision produce state-of-the-art results across 6 embodied AI benchmarks
for navigation, rearrangement, and arm manipulation, including the presently
running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We
also demonstrate strong 0-shot results on these benchmarks, via pre-training on
ProcTHOR with no fine-tuning on the downstream benchmark, often beating
previous state-of-the-art systems that access the downstream training data.
Related papers
- In-Simulation Testing of Deep Learning Vision Models in Autonomous Robotic Manipulators [11.389756788049944]
Testing autonomous robotic manipulators is challenging due to the complex software interactions between vision and control components.
A crucial element of modern robotic manipulators is the deep learning based object detection model.
We propose the MARTENS framework, which integrates a photorealistic NVIDIA Isaac Sim simulator with evolutionary search to identify critical scenarios.
arXiv Detail & Related papers (2024-10-25T03:10:42Z) - Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning [0.0]
We introduce goal-conditioned autoregressive models to generate crowd behaviors, capturing intricate interactions among individuals.
The model processes potential robot trajectory samples and predicts the reactions of surrounding individuals, enabling proactive robotic navigation in complex scenarios.
arXiv Detail & Related papers (2024-08-07T14:32:41Z) - GRUtopia: Dream General Robots in a City at Scale [65.08318324604116]
This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots.
GRScenes includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments.
GRResidents is a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction.
arXiv Detail & Related papers (2024-07-15T17:40:46Z) - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning [50.99807031490589]
We introduce LLARVA, a model trained with a novel instruction tuning method to unify a range of robotic learning tasks, scenarios, and environments.
We generate 8.5M image-visual trace pairs from the Open X-Embodiment dataset in order to pre-train our model.
Experiments yield strong performance, demonstrating that LLARVA performs well compared to several contemporary baselines.
arXiv Detail & Related papers (2024-06-17T17:55:29Z) - Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks.
We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration.
Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - PACT: Perception-Action Causal Transformer for Autoregressive Robotics
Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot.
We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion.
We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z) - RoboTHOR: An Open Simulation-to-Real Embodied AI Platform [56.50243383294621]
We introduce RoboTHOR to democratize research in interactive and embodied visual AI.
We show there exists a significant gap between the performance of models trained in simulation when they are tested in both simulations and their carefully constructed physical analogs.
arXiv Detail & Related papers (2020-04-14T20:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.