Related papers: Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM

Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM

URL: http://arxiv.org/abs/2411.15279v2
Date: Tue, 06 May 2025 14:25:00 GMT
Title: Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM
Authors: Maximilian Mews, Ansar Aynetdinov, Vivian Schiller, Peter Eisert, Alan Akbik,
Abstract summary: This paper introduces a novel approach for the generation of 3D geometry that generates surface-based Constructive Solid Geometry (CSG)<n>First, we create a dataset of 3D mechanical parts represented as code scripts by converting Boundary Representation geometry (BREP) into CSG-based Python scripts.<n>Second, we create annotations in natural language using GPT-4. The resulting dataset is used to fine-tune a code-generation LLM.
Score: 3.925328332747599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent advancements in machine learning, such as LLMs, are revolutionizing software development and creative industries, they have had minimal impact on engineers designing mechanical parts, which remains largely a manual process. Existing approaches to generating 3D geometry most commonly use meshes as a 3D representation. While meshes are suitable for assets in video games or animations, they lack sufficient precision and adaptability for mechanical engineering purposes. This paper introduces a novel approach for the generation of 3D geometry that generates surface-based Constructive Solid Geometry (CSG) by leveraging a code-generation LLM. First, we create a dataset of 3D mechanical parts represented as code scripts by converting Boundary Representation geometry (BREP) into CSG-based Python scripts. Second, we create annotations in natural language using GPT-4. The resulting dataset is used to fine-tune a code-generation LLM. The fine-tuned LLM can complete geometries based on positional input and natural language in a plausible way, demonstrating geometric understanding.

Related papers

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds [50.98900790623827]
MeshCoder is a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts.<n>We train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts.<n>Our approach achieves superior performance in shape-to-code reconstruction tasks and also facilitates intuitive geometric and topological editing.
arXiv Detail & Related papers (2025-08-20T17:50:15Z)
On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation [52.96632954620623]
We introduce a novel geometry-aware PEFT module specifically designed for 3D point cloud transformers.<n>Our approach sets a new benchmark for efficient, scalable, and geometry-aware fine-tuning of large-scale 3D point cloud models.
arXiv Detail & Related papers (2025-05-28T15:08:36Z)
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement [42.2054752179292]
Multimodal Large Language Models (MLLMs) excel at semantic tasks, but their application to 3D scene generation is hindered by their limited grounding on 3D geometry. We introduce a novel framework, FirePlace, that applies existing MLLMs in (1) 3D geometric reasoning and the extraction of relevant geometric details from the 3D scene, (2) constructing and solving geometric constraints on the extracted low-level geometry, and (3) pruning for final placements that conform to common sense.
arXiv Detail & Related papers (2025-03-06T19:34:15Z)
3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios. Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts. To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z)
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models [62.85566496673856]
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly. Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format.
arXiv Detail & Related papers (2024-11-14T17:08:23Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z)
3D-PreMise: Can Large Language Models Generate 3D Shapes with Sharp Features and Parametric Control? [8.893200442359518]
We introduce a framework that employs Large Language Models to generate text-driven 3D shapes. We present 3D-PreMise, a dataset specifically tailored for 3D parametric modeling of industrial shapes.
arXiv Detail & Related papers (2024-01-12T08:07:52Z)
Locally Adaptive Neural 3D Morphable Models [38.38400553022714]
We present the Locally Adaptive Morphable Model (LAMM), a framework for learning to generate and manipulate 3D meshes. A very efficient computational graph allows our network to train with only a fraction of the memory required by previous methods. We further leverage local geometry control as a primitive for higher level editing operations and present a set of derivative capabilities.
arXiv Detail & Related papers (2024-01-05T18:28:51Z)
GPT4Point: A Unified Framework for Point-Language Understanding and Generation [76.61439685940272]
GPT4Point is a groundbreaking point-language multimodal model for unified 3D object understanding and generation within the MLLM framework. GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A. It can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors.
arXiv Detail & Related papers (2023-12-05T18:59:55Z)
DSG-Net: Learning Disentangled Structure and Geometry for 3D Shape Generation [98.96086261213578]
We introduce DSG-Net, a deep neural network that learns a disentangled structured and geometric mesh representation for 3D shapes. This supports a range of novel shape generation applications with disentangled control, such as of structure (geometry) while keeping geometry (structure) unchanged. Our method not only supports controllable generation applications but also produces high-quality synthesized shapes, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-08-12T17:06:51Z)
Deep Geometric Texture Synthesis [83.9404865744028]
We propose a novel framework for synthesizing geometric textures. It learns texture statistics from local neighborhoods of a single reference 3D model. Our network displaces mesh vertices in any direction, enabling synthesis of geometric textures.
arXiv Detail & Related papers (2020-06-30T19:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.