Generative World Explorer
- URL: http://arxiv.org/abs/2411.11844v2
- Date: Tue, 19 Nov 2024 18:59:42 GMT
- Title: Generative World Explorer
- Authors: Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen,
- Abstract summary: Planning with partial observation is a central challenge in embodied AI.
We introduce the $textitGenerative World Explorer (Genex)$, an egocentric world exploration framework.
Genex allows agents to mentally explore a large-scale 3D world and acquire imagined observations to update its belief.
- Score: 28.135416905073313
- License:
- Abstract: Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world state. In contrast, humans can $\textit{imagine}$ unseen parts of the world through a mental exploration and $\textit{revise}$ their beliefs with imagined observations. Such updated beliefs can allow them to make more informed decisions, without necessitating the physical exploration of the world at all times. To achieve this human-like ability, we introduce the $\textit{Generative World Explorer (Genex)}$, an egocentric world exploration framework that allows an agent to mentally explore a large-scale 3D world (e.g., urban scenes) and acquire imagined observations to update its belief. This updated belief will then help the agent to make a more informed decision at the current step. To train $\textit{Genex}$, we create a synthetic urban scene dataset, Genex-DB. Our experimental results demonstrate that (1) $\textit{Genex}$ can generate high-quality and consistent observations during long-horizon exploration of a large virtual physical world and (2) the beliefs updated with the generated observations can inform an existing decision-making model (e.g., an LLM agent) to make better plans.
Related papers
- GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination.
GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image.
GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.15395503285804]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI)
In this survey, we embark on a comprehensive exploration of the latest advancements in world models.
We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z) - V-IRL: Grounding Virtual Intelligence in Real Life [65.87750250364411]
V-IRL is a platform that enables agents to interact with the real world in a virtual yet realistic environment.
Our platform serves as a playground for developing agents that can accomplish various practical tasks.
arXiv Detail & Related papers (2024-02-05T18:59:36Z) - Generative agents in the streets: Exploring the use of Large Language
Models (LLMs) in collecting urban perceptions [0.0]
This study explores the current advancements in Generative agents powered by large language models (LLMs)
The experiment employs Generative agents to interact with the urban environments using street view images to plan their journey toward specific goals.
Since LLMs do not possess embodiment, nor have access to the visual realm, and lack a sense of motion or direction, we designed movement and visual modules that help agents gain an overall understanding of surroundings.
arXiv Detail & Related papers (2023-12-20T15:45:54Z) - Neural World Models for Computer Vision [2.741266294612776]
We present a framework to train a world model and a policy, parameterised by deep neural networks.
We leverage important computer vision concepts such as geometry, semantics, and motion to scale world models to complex urban driving scenes.
Our model can jointly predict static scene, dynamic scene, and ego-behaviour in an urban driving environment.
arXiv Detail & Related papers (2023-06-15T14:58:21Z) - The Seven Worlds and Experiences of the Wireless Metaverse: Challenges
and Opportunities [58.42198877478623]
Wireless metaverse will create diverse user experiences at the intersection of the physical, digital, and virtual worlds.
We present a holistic vision of a limitless, wireless metaverse that distills the metaverse into an intersection of seven worlds and experiences.
We highlight the need for end-to-end synchronization of DTs, and the role of human-level AI and reasoning abilities for cognitive avatars.
arXiv Detail & Related papers (2023-04-20T13:04:52Z) - The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion
Planning Benchmark for Physically Realistic Embodied AI [96.86091264553613]
We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge.
In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment.
The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desired final location.
arXiv Detail & Related papers (2021-03-25T17:59:08Z) - Active World Model Learning with Progress Curiosity [12.077052764803163]
World models are self-supervised predictive models of how the world evolves.
In this work, we study how to design such a curiosity-driven Active World Model Learning system.
We propose an AWML system driven by $gamma$-Progress: a scalable and effective learning progress-based curiosity signal.
arXiv Detail & Related papers (2020-07-15T17:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.