GenEx: Generating an Explorable World
- URL: http://arxiv.org/abs/2412.09624v4
- Date: Mon, 20 Jan 2025 16:51:42 GMT
- Title: GenEx: Generating an Explorable World
- Authors: Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen,
- Abstract summary: We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination.
GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image.
GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
- Score: 59.0666303068111
- License:
- Abstract: Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments. GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image, bringing it to life through panoramic video streams. Leveraging scalable 3D world data curated from Unreal Engine, our generative model is rounded in the physical world. It captures a continuous 360-degree environment with little effort, offering a boundless landscape for AI agents to explore and interact with. GenEx achieves high-quality world generation, robust loop consistency over long trajectories, and demonstrates strong 3D capabilities such as consistency and active 3D mapping. Powered by generative imagination of the world, GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation. These agents utilize predictive expectation regarding unseen parts of the physical world to refine their beliefs, simulate different outcomes based on potential decisions, and make more informed choices. In summary, we demonstrate that GenEx provides a transformative platform for advancing embodied AI in imaginative spaces and brings potential for extending these capabilities to real-world exploration.
Related papers
- UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI [37.47562766916571]
We introduce UnrealZoo, a rich collection of photo-realistic 3D virtual worlds built on Unreal Engine.
We offer a variety of playable entities for embodied AI agents.
arXiv Detail & Related papers (2024-12-30T14:31:01Z) - Generative World Explorer [28.135416905073313]
Planning with partial observation is a central challenge in embodied AI.
We introduce the $textitGenerative World Explorer (Genex)$, an egocentric world exploration framework.
Genex allows agents to mentally explore a large-scale 3D world and acquire imagined observations to update its belief.
arXiv Detail & Related papers (2024-11-18T18:59:31Z) - CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [50.10101235281943]
Current generative methods fall short in either diversity, controllability, or fidelity.
In this work, we resort to the procedural content generation (PCG) technique for high-fidelity generation.
We develop a multi-agent framework to transform multi-modal instructions, including OSM, semantic maps, and satellite images, into executable programs.
Our method, named CityX, demonstrates its superiority in creating diverse, controllable, and realistic 3D urban scenes.
arXiv Detail & Related papers (2024-07-24T18:05:13Z) - Volumetric Environment Representation for Vision-Language Navigation [66.04379819772764]
Vision-language navigation (VLN) requires an agent to navigate through a 3D environment based on visual observations and natural language instructions.
We introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.
VER predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly.
arXiv Detail & Related papers (2024-03-21T06:14:46Z) - 3D-VLA: A 3D Vision-Language-Action Generative World Model [68.0388311799959]
Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.
We propose 3D-VLA by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action.
Our experiments on held-in datasets demonstrate that 3D-VLA significantly improves the reasoning, multimodal generation, and planning capabilities in embodied environments.
arXiv Detail & Related papers (2024-03-14T17:58:41Z) - Progress and Prospects in 3D Generative AI: A Technical Overview
including 3D human [51.58094069317723]
This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023.
It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.
arXiv Detail & Related papers (2024-01-05T03:41:38Z) - Beyond Reality: The Pivotal Role of Generative AI in the Metaverse [98.1561456565877]
This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse.
We delve into the applications of text generation models like ChatGPT and GPT-3, which are enhancing conversational interfaces with AI-generated characters.
We also examine the potential of 3D model generation technologies like Point-E and Lumirithmic in creating realistic virtual objects.
arXiv Detail & Related papers (2023-07-28T05:44:20Z) - GINA-3D: Learning to Generate Implicit Neural Assets in the Wild [38.51391650845503]
GINA-3D is a generative model that uses real-world driving data from camera and LiDAR sensors to create 3D implicit neural assets of diverse vehicles and pedestrians.
We construct a large-scale object-centric dataset containing over 1.2M images of vehicles and pedestrians.
We demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
arXiv Detail & Related papers (2023-04-04T23:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.