EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
- URL: http://arxiv.org/abs/2506.10600v2
- Date: Mon, 16 Jun 2025 08:50:31 GMT
- Title: EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
- Authors: Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, Zhizhong Su,
- Abstract summary: EmbodiedGen is a foundational platform for interactive 3D world generation.<n>It enables the scalable generation of high-quality, controllable and photorealistic 3D assets at low cost.
- Score: 8.987157387248317
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality, controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These assets can be directly imported into various physics simulation engines for fine-grained physical control, supporting downstream tasks in training and evaluation. EmbodiedGen is an easy-to-use, full-featured toolkit composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation. EmbodiedGen generates diverse and interactive 3D worlds composed of generative 3D assets, leveraging generative AI to address the challenges of generalization and evaluation to the needs of embodied intelligence related research. Code is available at https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html.
Related papers
- PhysX-3D: Physical-Grounded 3D Asset Generation [48.78065667043986]
Existing 3D generation primarily emphasizes geometries and textures while neglecting physical-grounded modeling.<n>We present PhysXNet - the first physics-grounded 3D dataset systematically annotated across five foundational dimensions.<n>We also propose textbfPhysXGen, a feed-forward framework for physics-grounded image-to-3D asset generation.
arXiv Detail & Related papers (2025-07-16T17:59:35Z) - 3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds [23.329458437342684]
We propose a scalable method for generating high-quality 3D environments that can serve as training data for foundation models.<n>Our proposed framework, 3D-Generalist, trains Vision-Language-Models to generate more prompt-aligned 3D environments.<n>We demonstrate its quality and scalability in synthetic data generation by pretraining a vision foundation model on the generated data.
arXiv Detail & Related papers (2025-07-09T02:00:17Z) - Aligning Text, Images, and 3D Structure Token-by-Token [8.521599463802637]
We investigate the potential of autoregressive models for structured 3D scenes.<n>We propose a unified LLM framework that aligns language, images, and 3D scenes.<n>We show our model's effectiveness on real-world 3D object recognition tasks.
arXiv Detail & Related papers (2025-06-09T17:59:37Z) - R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation [78.26308457952636]
This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome limitations in autonomous driving simulation.<n>It enables realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time.<n>We show that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer.
arXiv Detail & Related papers (2025-06-09T14:50:19Z) - Generative AI Framework for 3D Object Generation in Augmented Reality [0.0]
This thesis integrates state-of-the-art generative AI models for real-time creation of 3D objects in augmented reality (AR) environments.<n>The framework demonstrates applications across industries such as gaming, education, retail, and interior design.<n>A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience.
arXiv Detail & Related papers (2025-02-21T17:01:48Z) - GenEx: Generating an Explorable World [59.0666303068111]
We introduce GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination.<n>GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image.<n> GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation.
arXiv Detail & Related papers (2024-12-12T18:59:57Z) - Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication [50.541882834405946]
We introduce Atlas3D, an automatic and easy-to-implement text-to-3D method.
Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization.
We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.
arXiv Detail & Related papers (2024-05-28T18:33:18Z) - GINA-3D: Learning to Generate Implicit Neural Assets in the Wild [38.51391650845503]
GINA-3D is a generative model that uses real-world driving data from camera and LiDAR sensors to create 3D implicit neural assets of diverse vehicles and pedestrians.
We construct a large-scale object-centric dataset containing over 1.2M images of vehicles and pedestrians.
We demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
arXiv Detail & Related papers (2023-04-04T23:41:20Z) - GET3D: A Generative Model of High Quality 3D Textured Shapes Learned
from Images [72.15855070133425]
We introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures.
GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings.
arXiv Detail & Related papers (2022-09-22T17:16:19Z) - CRAVES: Controlling Robotic Arm with a Vision-based Economic System [96.56564257199474]
Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry.<n>This work discusses the role of computer vision algorithms in this field.<n>We present an alternative solution, which uses a 3D model to create a large number of synthetic data.
arXiv Detail & Related papers (2018-12-03T13:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.