Generating CAD Code with Vision-Language Models for 3D Designs
- URL: http://arxiv.org/abs/2410.05340v2
- Date: Fri, 28 Feb 2025 04:28:23 GMT
- Title: Generating CAD Code with Vision-Language Models for 3D Designs
- Authors: Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, Matthew Gombolay,
- Abstract summary: We introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code.<n>Our approach works by producing ameliorative feedback by prompting a Vision-Language Model to generate and answer a set of validation questions.<n>Our findings show that CADCodeVerify improves VLM performance by providing visual feedback, enhancing the structure of the 3D objects, and increasing the success rate of the compiled program.
- Score: 6.532952167132679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Computer- Aided Design (CAD) scripting code, which can then be executed to render a 3D object; however, the resulting 3D object may not meet the specified requirements. Testing the correctness of CAD generated code is challenging due to the complexity and structure of 3D objects (e.g., shapes, surfaces, and dimensions) that are not feasible in code. In this paper, we introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code. Our approach works by producing ameliorative feedback by prompting a Vision-Language Model (VLM) to generate and answer a set of validation questions to verify the generated object and prompt the VLM to correct deviations. To evaluate CADCodeVerify, we introduce, CADPrompt, the first benchmark for CAD code generation, consisting of 200 natural language prompts paired with expert-annotated scripting code for 3D objects to benchmark progress. Our findings show that CADCodeVerify improves VLM performance by providing visual feedback, enhancing the structure of the 3D objects, and increasing the success rate of the compiled program. When applied to GPT-4, CADCodeVerify achieved a 7.30% reduction in Point Cloud distance and a 5.0% improvement in success rate compared to prior work
Related papers
- CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design [10.105055422074734]
We introduce a new large-scale pipeline of more than 170k CAD models annotated with human-like descriptions.<n>Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design.
arXiv Detail & Related papers (2025-07-13T21:11:53Z) - CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction [11.33947758511237]
We introduce the CAD review task to automatically detect and correct potential errors.<n>In this paper, we propose the CAD program repairer (ReCAD) framework to effectively detect program errors.<n>We create a dataset, CADReview, consisting of over 20K program-image pairs, with diverse errors for the CAD review task.
arXiv Detail & Related papers (2025-05-28T12:41:00Z) - CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation [4.092348452904736]
This paper introduces CAD-Coder, an open-source Vision-Language Model (VLM) explicitly fine-tuned to generate editable CAD code (CadQuery Python) directly from visual input.<n>Leveraging a novel dataset that we created--GenCAD-Code, consisting of over 163k CAD-model image and code pairs--CAD-Coder outperforms state-of-the-art VLM baselines.
arXiv Detail & Related papers (2025-05-20T17:34:44Z) - CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.
We introduce a geometry encoder to accurately capture diverse geometric features.
Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z) - TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction [137.34863114016483]
TAR3D is a novel framework that consists of a 3D-aware Vector Quantized-Variational AutoEncoder (VQ-VAE) and a Generative Pre-trained Transformer (GPT)
We show that TAR3D can achieve superior generation quality over existing methods in text-to-3D and image-to-3D tasks.
arXiv Detail & Related papers (2024-12-22T08:28:20Z) - CAD-Recode: Reverse Engineering CAD Code from Point Clouds [12.864274930732055]
3D CAD reverse engineering consists of reconstructing the sketch and CAD operation sequences from 3D representations such as point clouds.
The proposed CAD-Recode translates a point cloud into Python code that, when executed, reconstructs the CAD model.
We show that our CAD Python code output is interpretable by off-the-shelf LLMs, enabling CAD editing and CAD-specific question answering from point clouds.
arXiv Detail & Related papers (2024-12-18T16:55:42Z) - Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry [12.265852643914439]
We present Img2CAD, the first knowledge that uses 2D image inputs to generate editable parameters.
Img2CAD enables seamless integration between AI 3D reconstruction and CAD representation.
arXiv Detail & Related papers (2024-10-04T13:27:52Z) - GenCAD: Image-Conditioned Computer-Aided Design Generation with
Transformer-Based Contrastive Representation and Diffusion Priors [4.485378844492069]
GenCAD is a generative model that transforms image inputs into parametric CAD command sequences.
It significantly outperforms existing state-of-the-art methods in terms of the precision and modifiability of generated 3D shapes.
arXiv Detail & Related papers (2024-09-08T23:49:11Z) - OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design [1.481550828146527]
We fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B)
OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands.
These outputs can be directly used with existing CAD tools' APIs to generate project files.
arXiv Detail & Related papers (2024-06-14T10:47:52Z) - GPT4Point: A Unified Framework for Point-Language Understanding and
Generation [76.61439685940272]
GPT4Point is a groundbreaking point-language multimodal model for unified 3D object understanding and generation within the MLLM framework.
GPT4Point as a powerful 3D MLLM seamlessly can execute a variety of point-text reference tasks such as point-cloud captioning and Q&A.
It can get high-quality results through a low-quality point-text feature maintaining the geometric shapes and colors.
arXiv Detail & Related papers (2023-12-05T18:59:55Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation [107.46972849241168]
3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture.
Experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects.
arXiv Detail & Related papers (2022-12-02T11:31:49Z) - AutoCAD: Automatically Generating Counterfactuals for Mitigating
Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - DeepCAD: A Deep Generative Network for Computer-Aided Design Models [37.655225142981564]
We present the first 3D generative model for a drastically different shape representation -- describing a shape as a sequence of computer-aided design (CAD) operations.
Drawing an analogy between CAD operations and natural language, we propose a CAD generative network based on the Transformer.
arXiv Detail & Related papers (2021-05-20T03:29:18Z) - PvDeConv: Point-Voxel Deconvolution for Autoencoding CAD Construction in
3D [23.87757211847093]
We learn to synthesize high-resolution point clouds of 10k points that densely describe the underlying geometry of Computer Aided Design (CAD) models.
We introduce a new dedicated dataset, the CC3D, containing 50k+ pairs of CAD models and their corresponding 3D meshes.
This dataset is used to learn a convolutional autoencoder for point clouds sampled from the pairs of 3D scans - CAD models.
arXiv Detail & Related papers (2021-01-12T14:14:13Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.