Related papers: MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

URL: http://arxiv.org/abs/2508.14879v2
Date: Fri, 22 Aug 2025 16:12:04 GMT
Title: MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Authors: Bingquan Dai, Li Ray Luo, Qihong Tang, Jie Wang, Xinyu Lian, Hao Xu, Minghan Qin, Xudong Xu, Bo Dai, Haoqian Wang, Zhaoyang Lyu, Jiangmiao Pang,
Abstract summary: MeshCoder is a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts.<n>We train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts.<n>Our approach achieves superior performance in shape-to-code reconstruction tasks and also facilitates intuitive geometric and topological editing.
Score: 50.98900790623827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshCoder, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshCoder as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding. The project homepage is available at \href{https://daibingquan.github.io/MeshCoder}{this link}.

Related papers

PatchAlign3D: Local Feature Alignment for Dense 3D Shape understanding [67.15800065888887]
Current foundation models for 3D shapes excel at global tasks (retrieval, classification) but transfer poorly to local part-level reasoning.<n>We introduce an encoder-only 3D model that produces language-aligned patch-level features directly from point clouds.<n>Our 3D encoder achieves zero-shot 3D part segmentation with fast single-pass inference without any test-time multi-view rendering.
arXiv Detail & Related papers (2026-01-05T18:55:45Z)
VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement [66.13644883379087]
We tackle three key challenges in 3D object arrangement task using MLLMs.<n>First, to address the weak visual grounding of MLLMs, we introduce an MCP-based API.<n>Second, we augment the MLLM's 3D scene understanding with a suite of specialized visual tools.<n>Third, to manage the iterative, error-prone updates, we propose a collaborative multi-agent framework.
arXiv Detail & Related papers (2025-12-26T19:22:39Z)
LL3M: Large Language 3D Modelers [18.23329430829059]
We present LL3M, a system that generates 3D assets by writing interpretable Python code in Blender.<n>We reformulate shape generation as a code-writing task, enabling greater modularity, editability, and integration with artist Blender.<n>Our experiments showcase the power of code as a generative and interpretable medium for 3D asset creation.
arXiv Detail & Related papers (2025-08-11T17:48:02Z)
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh [79.20802127426003]
MeshLLM is a framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes.<n>We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits.<n> Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding.
arXiv Detail & Related papers (2025-08-02T07:37:37Z)
Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations [112.29763628638112]
Object-X is a versatile multi-modal 3D representation framework.<n>It can encoding rich object embeddings and decoding them back into geometric and visual reconstructions.<n>It supports a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization.
arXiv Detail & Related papers (2025-06-05T09:14:42Z)
3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.<n>Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.<n>To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z)
Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLM [3.925328332747599]
This paper introduces a novel approach for the generation of 3D geometry that generates surface-based Constructive Solid Geometry (CSG)<n>First, we create a dataset of 3D mechanical parts represented as code scripts by converting Boundary Representation geometry (BREP) into CSG-based Python scripts.<n>Second, we create annotations in natural language using GPT-4. The resulting dataset is used to fine-tune a code-generation LLM.
arXiv Detail & Related papers (2024-11-22T15:29:12Z)
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model [108.35777542298224]
Reason3D processes point cloud data and text prompts to produce textual responses and segmentation masks.<n>We propose a hierarchical mask decoder that employs a coarse-to-fine approach to segment objects within expansive scenes.
arXiv Detail & Related papers (2024-05-27T17:59:41Z)
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction [11.9747147315069]
We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects using interpretable shape programs.<n>We show that a combination of PyTorchGeoNodes with genetic algorithm is a method of choice to optimize both discrete and continuous shape program parameters.
arXiv Detail & Related papers (2024-04-16T14:43:33Z)
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction [37.0434133128805]
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++. ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated benchmark, 3D MM-Vet.
arXiv Detail & Related papers (2024-02-27T18:57:12Z)
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z)
LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video. Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.