LLM-to-Phy3D: Physically Conform Online 3D Object Generation with LLMs
- URL: http://arxiv.org/abs/2506.11148v1
- Date: Wed, 11 Jun 2025 10:06:21 GMT
- Title: LLM-to-Phy3D: Physically Conform Online 3D Object Generation with LLMs
- Authors: Melvin Wong, Yueming Lyu, Thiago Rios, Stefan Menzel, Yew-Soon Ong,
- Abstract summary: We introduce LLM-to-Phy3D, a physically conform online 3D object generation that enables existing LLM-to-3D models to produce conforming 3D objects on the fly.<n> Systematic evaluations of LLM-to-Phy3D, supported by ablation studies in vehicle design optimization, reveal various LLM improvements gained by 4.5% to 106.7%.<n>The encouraging results suggest the potential general use of LLM-to-Phy3D in Physical AI for scientific and engineering applications.
- Score: 25.95070778191463
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The emergence of generative artificial intelligence (GenAI) and large language models (LLMs) has revolutionized the landscape of digital content creation in different modalities. However, its potential use in Physical AI for engineering design, where the production of physically viable artifacts is paramount, remains vastly underexplored. The absence of physical knowledge in existing LLM-to-3D models often results in outputs detached from real-world physical constraints. To address this gap, we introduce LLM-to-Phy3D, a physically conform online 3D object generation that enables existing LLM-to-3D models to produce physically conforming 3D objects on the fly. LLM-to-Phy3D introduces a novel online black-box refinement loop that empowers large language models (LLMs) through synergistic visual and physics-based evaluations. By delivering directional feedback in an iterative refinement process, LLM-to-Phy3D actively drives the discovery of prompts that yield 3D artifacts with enhanced physical performance and greater geometric novelty relative to reference objects, marking a substantial contribution to AI-driven generative design. Systematic evaluations of LLM-to-Phy3D, supported by ablation studies in vehicle design optimization, reveal various LLM improvements gained by 4.5% to 106.7% in producing physically conform target domain 3D designs over conventional LLM-to-3D models. The encouraging results suggest the potential general use of LLM-to-Phy3D in Physical AI for scientific and engineering applications.
Related papers
- SpatialLM: Training Large Language Models for Structured Indoor Modeling [34.0957676434764]
SpatialLM is a large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs.<n>We collect a large-scale, high-quality synthetic dataset consisting of the point clouds of 12,328 indoor scenes with ground-truth 3D annotations.<n>Our model gives state-of-the-art performance in layout estimation and competitive results in 3D object detection.
arXiv Detail & Related papers (2025-06-09T07:10:58Z) - Large Language-Geometry Model: When LLM meets Equivariance [53.8505081745406]
We propose EquiLLM, a novel framework for representing 3D physical systems.<n>We show that EquiLLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design.
arXiv Detail & Related papers (2025-02-16T14:50:49Z) - 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow [69.94527569577295]
3D vision and spatial reasoning have long been recognized as preferable for accurately perceiving our three-dimensional world.<n>Due to the difficulties in collecting high-quality 3D data, research in this area has only recently gained momentum.<n>We propose converting existing densely activated LLMs into mixture-of-experts (MoE) models, which have proven effective for multi-modal data processing.
arXiv Detail & Related papers (2025-01-28T04:31:19Z) - LLMI3D: MLLM-based 3D Perception from a Single 2D Image [77.13869413871028]
multimodal large language models (MLLMs) excel in general capacity but underperform in 3D tasks.<n>In this paper, we propose solutions for weak 3D local spatial object perception, poor text-based geometric numerical output, and inability to handle camera focal variations.<n>We employ parameter-efficient fine-tuning for a pre-trained MLLM and develop LLMI3D, a powerful 3D perception MLLM.
arXiv Detail & Related papers (2024-08-14T10:00:16Z) - Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication [50.541882834405946]
We introduce Atlas3D, an automatic and easy-to-implement text-to-3D method.
Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization.
We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments.
arXiv Detail & Related papers (2024-05-28T18:33:18Z) - Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? [66.6886931183372]
We introduce DETR-style 3D perceptrons as 3D tokenizers, which connect LLM with a one-layer linear projector.
Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks.
arXiv Detail & Related papers (2024-05-28T16:57:44Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - 3D-PreMise: Can Large Language Models Generate 3D Shapes with Sharp
Features and Parametric Control? [8.893200442359518]
We introduce a framework that employs Large Language Models to generate text-driven 3D shapes.
We present 3D-PreMise, a dataset specifically tailored for 3D parametric modeling of industrial shapes.
arXiv Detail & Related papers (2024-01-12T08:07:52Z) - Towards Language-guided Interactive 3D Generation: LLMs as Layout
Interpreter with Generative Feedback [20.151147653552155]
Large Language Models (LLMs) have demonstrated impressive reasoning, conversational, and zero-shot generation abilities.
We propose a novel language-guided interactive 3D generation system, dubbed LI3D, that integrates LLMs as a 3D layout interpreter.
Our system also incorporates LLaVA, a large language and vision assistant, to provide generative feedback from the visual aspect for improving the visual quality of generated content.
arXiv Detail & Related papers (2023-05-25T07:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.