Related papers: CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

URL: http://arxiv.org/abs/2411.04954v1
Date: Thu, 07 Nov 2024 18:31:08 GMT
Title: CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Authors: Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, Shenghua Gao,
Abstract summary: We introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. We use advanced large language models (LLMs) to align the feature space across diverse multi-modalities data and CAD models' vectorized representations. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model.
Score: 39.113795259823476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/

Related papers

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation [59.76687657887415]
We propose a cascade MAR with topology predictor (CMT), the first multimodal framework for CAD generation based on Boundary Representation (B-Rep) Specifically, the cascade MAR can effectively capture the edge-counters-surface'' priors that are essential in B-Reps. We develop a large-scale multimodal CAD dataset, mmABC, which includes over 1.3 million B-Rep models with multimodal annotations.
arXiv Detail & Related papers (2025-04-29T14:52:28Z)
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data. We introduce a geometry encoder to accurately capture diverse geometric features. Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z)
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers [12.5472026454031]
CAD-Assistant addresses user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software. We consider a wide range of CAD-specific tools including a sketch image parameterizer, rendering modules, a 2D cross-section generator, and other specialized routines.
arXiv Detail & Related papers (2024-12-18T12:57:56Z)
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement [45.19076032719869]
We present BlenderLLM, a framework for training Large Language Models (LLMs) in Computer-Aided Design (CAD) Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts. Through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation.
arXiv Detail & Related papers (2024-12-16T14:34:02Z)
Text2CAD: Text to 3D CAD Generation via Technical Drawings [45.3611544056261]
Text2CAD is a novel framework that employs stable diffusion models tailored to automate the generation process. We show that Text2CAD effectively generates technical drawings that are accurately translated into high-quality 3D CAD models.
arXiv Detail & Related papers (2024-11-09T15:12:06Z)
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models [22.010338370150738]
There is a growing interest in creating computer-aided design (CAD) models based on user intent. Existing work offers limited controllability and needs separate models for different types of control. We propose FlexCAD, a unified model by fine-tuning large language models.
arXiv Detail & Related papers (2024-11-05T05:45:26Z)
Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry [12.265852643914439]
We present Img2CAD, the first knowledge that uses 2D image inputs to generate editable parameters. Img2CAD enables seamless integration between AI 3D reconstruction and CAD representation.
arXiv Detail & Related papers (2024-10-04T13:27:52Z)
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches [24.239470848849418]
Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. We propose CadVLM, an end-to-end vision language model for CAD generation.
arXiv Detail & Related papers (2024-09-26T01:22:29Z)
PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction [86.726941702182]
We introduce geometric guidance into the reconstruction network PS-CAD. We provide the geometry of surfaces where the current reconstruction differs from the complete model as a point cloud. Second, we use geometric analysis to extract a set of planar prompts, that correspond to candidate surfaces.
arXiv Detail & Related papers (2024-05-24T03:43:55Z)
ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models [0.7373617024876725]
We propose a contrastive learning-based approach to learning CAD models, named ContrastCAD. ContrastCAD effectively captures semantic information within the construction sequences of the CAD model. We also propose a new CAD data augmentation method, called a Random Replace and Extrude (RRE) method, to enhance the learning performance of the model.
arXiv Detail & Related papers (2024-04-02T05:30:39Z)
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z)
CADOps-Net: Jointly Learning CAD Operation Types and Steps from Boundary-Representations [17.051792180335354]
This paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CAD operation steps. Compared to existing datasets, the complexity and variety of CC3D-Ops models are closer to those used for industrial purposes.
arXiv Detail & Related papers (2022-08-22T19:12:20Z)
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion. Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z)
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image. We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose. This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.