MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
- URL: http://arxiv.org/abs/2508.14879v2
- Date: Fri, 22 Aug 2025 16:12:04 GMT
- Title: MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
- Authors: Bingquan Dai, Li Ray Luo, Qihong Tang, Jie Wang, Xinyu Lian, Hao Xu, Minghan Qin, Xudong Xu, Bo Dai, Haoqian Wang, Zhaoyang Lyu, Jiangmiao Pang,
- Abstract summary: MeshCoder is a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts.<n>We train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts.<n>Our approach achieves superior performance in shape-to-code reconstruction tasks and also facilitates intuitive geometric and topological editing.
- Score: 50.98900790623827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshCoder, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshCoder as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding. The project homepage is available at \href{https://daibingquan.github.io/MeshCoder}{this link}.
Related papers
- PatchAlign3D: Local Feature Alignment for Dense 3D Shape understanding [67.15800065888887]
Current foundation models for 3D shapes excel at global tasks (retrieval, classification) but transfer poorly to local part-level reasoning.<n>We introduce an encoder-only 3D model that produces language-aligned patch-level features directly from point clouds.<n>Our 3D encoder achieves zero-shot 3D part segmentation with fast single-pass inference without any test-time multi-view rendering.
arXiv Detail & Related papers (2026-01-05T18:55:45Z) - VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement [66.13644883379087]
We tackle three key challenges in 3D object arrangement task using MLLMs.<n>First, to address the weak visual grounding of MLLMs, we introduce an MCP-based API.<n>Second, we augment the MLLM's 3D scene understanding with a suite of specialized visual tools.<n>Third, to manage the iterative, error-prone updates, we propose a collaborative multi-agent framework.
arXiv Detail & Related papers (2025-12-26T19:22:39Z) - LL3M: Large Language 3D Modelers [18.23329430829059]
We present LL3M, a system that generates 3D assets by writing interpretable Python code in Blender.<n>We reformulate shape generation as a code-writing task, enabling greater modularity, editability, and integration with artist Blender.<n>Our experiments showcase the power of code as a generative and interpretable medium for 3D asset creation.
arXiv Detail & Related papers (2025-08-11T17:48:02Z) - MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh [79.20802127426003]
MeshLLM is a framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes.<n>We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits.<n> Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding.
arXiv Detail & Related papers (2025-08-02T07:37:37Z) - Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations [112.29763628638112]
Object-X is a versatile multi-modal 3D representation framework.<n>It can encoding rich object embeddings and decoding them back into geometric and visual reconstructions.<n>It supports a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization.
arXiv Detail & Related papers (2025-06-05T09:14:42Z) - 3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.<n>Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.<n>To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z) - Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM [3.925328332747599]
This paper introduces a novel approach for the generation of 3D geometry that generates surface-based Constructive Solid Geometry (CSG)<n>First, we create a dataset of 3D mechanical parts represented as code scripts by converting Boundary Representation geometry (BREP) into CSG-based Python scripts.<n>Second, we create annotations in natural language using GPT-4. The resulting dataset is used to fine-tune a code-generation LLM.
arXiv Detail & Related papers (2024-11-22T15:29:12Z) - Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model [108.35777542298224]
Reason3D processes point cloud data and text prompts to produce textual responses and segmentation masks.<n>We propose a hierarchical mask decoder that employs a coarse-to-fine approach to segment objects within expansive scenes.
arXiv Detail & Related papers (2024-05-27T17:59:41Z) - PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction [11.9747147315069]
We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects using interpretable shape programs.<n>We show that a combination of PyTorchGeoNodes with genetic algorithm is a method of choice to optimize both discrete and continuous shape program parameters.
arXiv Detail & Related papers (2024-04-16T14:43:33Z) - ShapeLLM: Universal 3D Object Understanding for Embodied Interaction [37.0434133128805]
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction.
ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++.
ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated benchmark, 3D MM-Vet.
arXiv Detail & Related papers (2024-02-27T18:57:12Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video.
Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.