From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality
- URL: http://arxiv.org/abs/2503.16474v1
- Date: Tue, 04 Mar 2025 06:31:51 GMT
- Title: From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality
- Authors: Majid Behravan, Denis Gracanin,
- Abstract summary: Matrix is an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments.<n>By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models, the system enables seamless user interactions through spoken commands.
- Score: 0.7388329684634598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Matrix, an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments. By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models (LLMs), the system enables seamless user interactions through spoken commands. The framework processes speech inputs, generates 3D objects, and provides object recommendations based on contextual understanding, enhancing AR experiences. A key feature of this framework is its ability to optimize 3D models by reducing mesh complexity, resulting in significantly smaller file sizes and faster processing on resource-constrained AR devices. Our approach addresses the challenges of high GPU usage, large model output sizes, and real-time system responsiveness, ensuring a smoother user experience. Moreover, the system is equipped with a pre-generated object repository, further reducing GPU load and improving efficiency. We demonstrate the practical applications of this framework in various fields such as education, design, and accessibility, and discuss future enhancements including image-to-3D conversion, environmental object detection, and multimodal support. The open-source nature of the framework promotes ongoing innovation and its utility across diverse industries.
Related papers
- Text To 3D Object Generation For Scalable Room Assembly [9.275648239993703]
We propose an end-to-end system for synthetic data generation for scalable, high-quality, and customizable 3D indoor scenes.
This system generates highfidelity 3D object assets from text prompts and incorporates them into pre-defined floor plans using a rendering tool.
arXiv Detail & Related papers (2025-04-12T20:13:07Z) - LandMarkSystem Technical Report [4.885906902650898]
3D reconstruction is vital for applications in autonomous driving, virtual reality, augmented reality, and the metaverse.<n>Recent advancements such as Neural Radiance Fields(NeRF) and 3D Gaussian Splatting (3DGS) have transformed the field, yet traditional deep learning frameworks struggle to meet the increasing demands for scene quality and scale.<n>This paper introduces LandMarkSystem, a novel computing framework designed to enhance multi-scale scene reconstruction and rendering.
arXiv Detail & Related papers (2025-03-27T10:55:36Z) - Generative AI Framework for 3D Object Generation in Augmented Reality [0.0]
This thesis integrates state-of-the-art generative AI models for real-time creation of 3D objects in augmented reality (AR) environments.<n>The framework demonstrates applications across industries such as gaming, education, retail, and interior design.<n>A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience.
arXiv Detail & Related papers (2025-02-21T17:01:48Z) - Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation [56.862552362223425]
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts.
The framework consists of 3D shape generation and texture generation.
This report details the system architecture, experimental results, and potential future directions to improve and expand the framework.
arXiv Detail & Related papers (2025-02-20T04:22:30Z) - Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI [10.335943413484815]
seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment.
We introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation.
We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time.
arXiv Detail & Related papers (2024-10-06T23:25:21Z) - Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model [35.184607650708784]
Articulate-Anything automates the articulation of diverse, complex objects from many input modalities, including text, images, and videos.<n>Our system exploits existing 3D asset datasets via a mesh retrieval mechanism, along with an actor-critic system that iteratively proposes, evaluates, and refines solutions.
arXiv Detail & Related papers (2024-10-03T19:42:16Z) - Coral Model Generation from Single Images for Virtual Reality Applications [22.18438294137604]
This paper introduces a deep-learning framework that generates high-precision 3D coral models from a single image.
The project incorporates Explainable AI (XAI) to transform AI-generated models into interactive "artworks"
arXiv Detail & Related papers (2024-09-04T01:54:20Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Pushing the Limits of 3D Shape Generation at Scale [65.24420181727615]
We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions.
We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
arXiv Detail & Related papers (2023-06-20T13:01:19Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.