UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
- URL: http://arxiv.org/abs/2412.20977v2
- Date: Tue, 12 Aug 2025 11:56:32 GMT
- Title: UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
- Authors: Fangwei Zhong, Kui Wu, Churan Wang, Hao Chen, Hai Ci, Zhoujun Li, Yizhou Wang,
- Abstract summary: We introduce UnrealZoo, a collection of over 100 photo-realistic 3D virtual worlds built on Unreal Engine.<n>We also provide a rich variety of playable entities, including humans, animals, robots, and vehicles for embodied AI research.
- Score: 37.47562766916571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce UnrealZoo, a collection of over 100 photo-realistic 3D virtual worlds built on Unreal Engine, designed to reflect the complexity and variability of open-world environments. We also provide a rich variety of playable entities, including humans, animals, robots, and vehicles for embodied AI research. We extend UnrealCV with optimized APIs and tools for data collection, environment augmentation, distributed training, and benchmarking. These improvements achieve significant improvements in the efficiency of rendering and communication, enabling advanced applications such as multi-agent interactions. Our experimental evaluation across visual navigation and tracking tasks reveals two key insights: 1) environmental diversity provides substantial benefits for developing generalizable reinforcement learning (RL) agents, and 2) current embodied agents face persistent challenges in open-world scenarios, including navigation in unstructured terrain, adaptation to unseen morphologies, and managing latency in the close-loop control systems for interacting in highly dynamic objects. UnrealZoo thus serves as both a comprehensive testing ground and a pathway toward developing more capable embodied AI systems for real-world deployment.
Related papers
- TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research [9.65694006177344]
Developing embodied AI for intelligent surgical systems requires safe, controllable environments for continual learning and evaluation.<n>Digital twins provide high-fidelity, risk-free environments for exploration and training.<n>We introduce TwinOR, a framework for constructing photorealistic, dynamic digital twins of operating rooms for embodied AI research.
arXiv Detail & Related papers (2025-11-10T18:57:09Z) - Dyna-Mind: Learning to Simulate from Experience for Better AI Agents [62.21219817256246]
We argue that current AI agents need ''vicarious trial and error'' - the capacity to mentally simulate alternative futures before acting.<n>We introduce Dyna-Mind, a two-stage training framework that explicitly teaches (V)LM agents to integrate such simulation into their reasoning.
arXiv Detail & Related papers (2025-10-10T17:30:18Z) - Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges [87.02855999212817]
Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously.<n>World models act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi-step actions with foresight.<n>This survey bridges the gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge.
arXiv Detail & Related papers (2025-08-13T07:29:40Z) - GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination.
GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image.
GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z) - EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment [38.14321677323052]
Embodied artificial intelligence emphasizes the role of an agent's body in generating human-like behaviors.
In this paper, we construct a benchmark platform for embodied intelligence evaluation in real-world city environments.
arXiv Detail & Related papers (2024-10-12T17:49:26Z) - Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI [10.335943413484815]
seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment.
We introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation.
We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time.
arXiv Detail & Related papers (2024-10-06T23:25:21Z) - Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning [17.906144781244336]
We train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision.
This paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer.
arXiv Detail & Related papers (2024-05-03T18:41:13Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Self-supervised novel 2D view synthesis of large-scale scenes with
efficient multi-scale voxel carving [77.07589573960436]
We introduce an efficient multi-scale voxel carving method to generate novel views of real scenes.
Our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module.
We demonstrate the effectiveness of our method on highly complex and large-scale scenes in real environments.
arXiv Detail & Related papers (2023-06-26T13:57:05Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments [5.020816812380825]
Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments.
However, they are hardly to more complicated problems, due to the lack of complexity and variations in the environments they are trained and tested on.
We developed WILD-SCAV, a powerful and open-world environment based on a 3D open-world FPS game to bridge the gap.
It provides realistic 3D environments of variable complexity, various tasks, and multiple modes of interaction, where agents can learn to perceive 3D environments, navigate and plan, compete and cooperate in a human-like manner
arXiv Detail & Related papers (2022-10-14T13:39:41Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z) - AI Online Filters to Real World Image Recognition [4.874719076317905]
We study a novel approach to add reinforcement controls onto the image recognition reflex models to attain better overall performance.
Follow a common infrastructure with environment sensing and AI based modeling of self-adaptive agents, we implement multiple types of AI control agents.
arXiv Detail & Related papers (2020-02-11T08:23:14Z) - CRAVES: Controlling Robotic Arm with a Vision-based Economic System [96.56564257199474]
Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry.<n>This work discusses the role of computer vision algorithms in this field.<n>We present an alternative solution, which uses a 3D model to create a large number of synthetic data.
arXiv Detail & Related papers (2018-12-03T13:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.