Generative AI Framework for 3D Object Generation in Augmented Reality
- URL: http://arxiv.org/abs/2502.15869v1
- Date: Fri, 21 Feb 2025 17:01:48 GMT
- Title: Generative AI Framework for 3D Object Generation in Augmented Reality
- Authors: Majid Behravan,
- Abstract summary: This thesis integrates state-of-the-art generative AI models for real-time creation of 3D objects in augmented reality (AR) environments.<n>The framework demonstrates applications across industries such as gaming, education, retail, and interior design.<n>A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This thesis presents a framework that integrates state-of-the-art generative AI models for real-time creation of three-dimensional (3D) objects in augmented reality (AR) environments. The primary goal is to convert diverse inputs, such as images and speech, into accurate 3D models, enhancing user interaction and immersion. Key components include advanced object detection algorithms, user-friendly interaction techniques, and robust AI models like Shap-E for 3D generation. Leveraging Vision Language Models (VLMs) and Large Language Models (LLMs), the system captures spatial details from images and processes textual information to generate comprehensive 3D objects, seamlessly integrating virtual objects into real-world environments. The framework demonstrates applications across industries such as gaming, education, retail, and interior design. It allows players to create personalized in-game assets, customers to see products in their environments before purchase, and designers to convert real-world objects into 3D models for real-time visualization. A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience, fostering creativity and innovation. The framework addresses challenges like handling multilingual inputs, diverse visual data, and complex environments, improving object detection and model generation accuracy, as well as loading 3D models in AR space in real-time. In conclusion, this thesis integrates generative AI and AR for efficient 3D model generation, enhancing accessibility and paving the way for innovative applications and improved user interactions in AR environments.
Related papers
- From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality [0.7388329684634598]
Matrix is an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments.
By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models, the system enables seamless user interactions through spoken commands.
arXiv Detail & Related papers (2025-03-04T06:31:51Z) - CLAS: A Machine Learning Enhanced Framework for Exploring Large 3D Design Datasets [1.281023989926633]
We propose a machine learning (ML) enhanced framework CLAS to enable fully automatic retrieval of 3D objects.<n>As a proof of concept, we created and showcased a search system with a web user interface (UI) for retrieving 6,778 3D objects of chairs.<n>In a close-set retrieval setting, our retrieval method achieves a mean reciprocal rank (MRR) of 0.58, top 1 accuracy of 42.27%, and top 10 accuracy of 89.64%.
arXiv Detail & Related papers (2024-12-04T03:29:56Z) - Coral Model Generation from Single Images for Virtual Reality Applications [22.18438294137604]
This paper introduces a deep-learning framework that generates high-precision 3D coral models from a single image.
The project incorporates Explainable AI (XAI) to transform AI-generated models into interactive "artworks"
arXiv Detail & Related papers (2024-09-04T01:54:20Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - 3D-VLA: A 3D Vision-Language-Action Generative World Model [68.0388311799959]
Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.
We propose 3D-VLA by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action.
Our experiments on held-in datasets demonstrate that 3D-VLA significantly improves the reasoning, multimodal generation, and planning capabilities in embodied environments.
arXiv Detail & Related papers (2024-03-14T17:58:41Z) - Large-Vocabulary 3D Diffusion Model with Transformer [57.076986347047]
We introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model.
We propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Pushing the Limits of 3D Shape Generation at Scale [65.24420181727615]
We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions.
We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
arXiv Detail & Related papers (2023-06-20T13:01:19Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.