UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
- URL: http://arxiv.org/abs/2412.20977v1
- Date: Mon, 30 Dec 2024 14:31:01 GMT
- Title: UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
- Authors: Fangwei Zhong, Kui Wu, Churan Wang, Hao Chen, Hai Ci, Zhoujun Li, Yizhou Wang,
- Abstract summary: We introduce UnrealZoo, a rich collection of photo-realistic 3D virtual worlds built on Unreal Engine.
We offer a variety of playable entities for embodied AI agents.
- Score: 37.47562766916571
- License:
- Abstract: We introduce UnrealZoo, a rich collection of photo-realistic 3D virtual worlds built on Unreal Engine, designed to reflect the complexity and variability of the open worlds. Additionally, we offer a variety of playable entities for embodied AI agents. Based on UnrealCV, we provide a suite of easy-to-use Python APIs and tools for various potential applications, such as data collection, environment augmentation, distributed training, and benchmarking. We optimize the rendering and communication efficiency of UnrealCV to support advanced applications, such as multi-agent interaction. Our experiments benchmark agents in various complex scenes, focusing on visual navigation and tracking, which are fundamental capabilities for embodied visual intelligence. The results yield valuable insights into the advantages of diverse training environments for reinforcement learning (RL) agents and the challenges faced by current embodied vision agents, including those based on RL and large vision-language models (VLMs), in open worlds. These challenges involve latency in closed-loop control in dynamic scenes and reasoning about 3D spatial structures in unstructured terrain.
Related papers
- GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination.
GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image.
GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z) - EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment [38.14321677323052]
Embodied artificial intelligence emphasizes the role of an agent's body in generating human-like behaviors.
In this paper, we construct a benchmark platform for embodied intelligence evaluation in real-world city environments.
arXiv Detail & Related papers (2024-10-12T17:49:26Z) - Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI [10.335943413484815]
seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment.
We introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation.
We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time.
arXiv Detail & Related papers (2024-10-06T23:25:21Z) - Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning [17.906144781244336]
We train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision.
This paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer.
arXiv Detail & Related papers (2024-05-03T18:41:13Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Self-supervised novel 2D view synthesis of large-scale scenes with
efficient multi-scale voxel carving [77.07589573960436]
We introduce an efficient multi-scale voxel carving method to generate novel views of real scenes.
Our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module.
We demonstrate the effectiveness of our method on highly complex and large-scale scenes in real environments.
arXiv Detail & Related papers (2023-06-26T13:57:05Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments [5.020816812380825]
Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments.
However, they are hardly to more complicated problems, due to the lack of complexity and variations in the environments they are trained and tested on.
We developed WILD-SCAV, a powerful and open-world environment based on a 3D open-world FPS game to bridge the gap.
It provides realistic 3D environments of variable complexity, various tasks, and multiple modes of interaction, where agents can learn to perceive 3D environments, navigate and plan, compete and cooperate in a human-like manner
arXiv Detail & Related papers (2022-10-14T13:39:41Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.