Related papers: Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions

Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions

URL: http://arxiv.org/abs/2512.15743v1
Date: Wed, 10 Dec 2025 05:55:33 GMT
Title: Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions
Authors: David Noever,
Abstract summary: We present a framework for generating physically realizable assembly instructions from natural language descriptions.<n>Using LDraw as a text-rich intermediate representation, we demonstrate that large language models can be guided with tools to produce valid step-by-step construction sequences.<n>We introduce a Python library for programmatic model generation and evaluate buildable outputs on complex satellites, aircraft, and architectural domains.
Score: 3.0620527758972496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a framework for generating physically realizable assembly instructions from natural language descriptions. Unlike unconstrained text-to-3D approaches, our method operates within a discrete parts vocabulary, enforcing geometric validity, connection constraints, and buildability ordering. Using LDraw as a text-rich intermediate representation, we demonstrate that large language models can be guided with tools to produce valid step-by-step construction sequences and assembly instructions for brick-based prototypes of more than 3000 assembly parts. We introduce a Python library for programmatic model generation and evaluate buildable outputs on complex satellites, aircraft, and architectural domains. The approach aims for demonstrable scalability, modularity, and fidelity that bridges the gap between semantic design intent and manufacturable output. Physical prototyping follows from natural language specifications. The work proposes a novel elemental lingua franca as a key missing piece from the previous pixel-based diffusion methods or computer-aided design (CAD) models that fail to support complex assembly instructions or component exchange. Across four original designs, this novel "bag of bricks" method thus functions as a physical API: a constrained vocabulary connecting precisely oriented brick locations to a "bag of words" through which arbitrary functional requirements compile into material reality. Given such a consistent and repeatable AI representation opens new design options while guiding natural language implementations in manufacturing and engineering prototyping.

Related papers

Natural Language Interface for Firewall Configuration [0.0]
This paper presents the design and prototype implementation of a natural language interface for configuring enterprise firewalls.<n>The framework allows administrators to express access control policies in plain language, which are then translated into vendor specific policies.
arXiv Detail & Related papers (2025-12-11T16:33:33Z)
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model [35.75184591224847]
Part-X-MLLM is a native 3D multimodal large language model.<n>It unifies diverse 3D tasks by formulating them as programs in a structured, executable grammar.
arXiv Detail & Related papers (2025-11-17T17:59:52Z)
$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion [31.2362624526101]
We propose a language-driven framework that decomposing procedural text into coherent visual instructions.<n>Our approach models the linguistic structure of instructional content by coherence it into goal statements and sequential steps, then conditioning visual generation on these linguistic elements.<n>This work contributes to the growing body of research on grounding procedural language in visual content, with applications spanning education, task guidance, and multimodal language understanding.
arXiv Detail & Related papers (2025-05-22T09:10:09Z)
Generating Physically Stable and Buildable Brick Structures from Text [63.75381708299733]
BrickGPT is the first approach for generating physically stable assembly models from text prompts.<n>We release our dataset, StableText2Brick, containing over7,000 3D textured brick structures.
arXiv Detail & Related papers (2025-05-08T17:58:18Z)
Langformers: Unified NLP Pipelines for Language Models [3.690904966341072]
Langformers is an open-source Python library designed to streamline NLP pipelines.<n>It integrates conversational AI, pretraining, text classification, sentence embedding/reranking, data labelling, semantic search, and knowledge distillation into a cohesive API.
arXiv Detail & Related papers (2025-04-12T10:17:49Z)
Establishing tool support for a concept DSL [0.0]
This thesis describes Conceptual, a DSL for modeling the behavior of software systems using self-contained and highly reusable units of concepts.<n>The suggested strategy is then implemented with a simple compiler, allowing developers to access and utilize Alloy's existing analysis tools for program reasoning.
arXiv Detail & Related papers (2025-03-07T09:18:31Z)
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models. CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z)
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects [53.4874127399702]
We propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation. We develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender. Our approach surpasses the standard GPT-4 and other language agents for 3D mesh generation on ShapeNet.
arXiv Detail & Related papers (2024-02-14T09:51:05Z)
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks. Our approach is adjustable and flexible in accommodating various instruction modalities and input types. Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z)
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions [53.21504989297547]
We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy.
arXiv Detail & Related papers (2022-11-01T18:30:42Z)
Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings. We demonstrate that this framework enables effective generalization across different environments. For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.