Related papers: CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation

CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation

URL: http://arxiv.org/abs/2505.14646v1
Date: Tue, 20 May 2025 17:34:44 GMT
Title: CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation
Authors: Anna C. Doris, Md Ferdous Alam, Amin Heyrani Nobari, Faez Ahmed,
Abstract summary: This paper introduces CAD-Coder, an open-source Vision-Language Model (VLM) explicitly fine-tuned to generate editable CAD code (CadQuery Python) directly from visual input.<n>Leveraging a novel dataset that we created--GenCAD-Code, consisting of over 163k CAD-model image and code pairs--CAD-Coder outperforms state-of-the-art VLM baselines.
Score: 4.092348452904736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient creation of accurate and editable 3D CAD models is critical in engineering design, significantly impacting cost and time-to-market in product innovation. Current manual workflows remain highly time-consuming and demand extensive user expertise. While recent developments in AI-driven CAD generation show promise, existing models are limited by incomplete representations of CAD operations, inability to generalize to real-world images, and low output accuracy. This paper introduces CAD-Coder, an open-source Vision-Language Model (VLM) explicitly fine-tuned to generate editable CAD code (CadQuery Python) directly from visual input. Leveraging a novel dataset that we created--GenCAD-Code, consisting of over 163k CAD-model image and code pairs--CAD-Coder outperforms state-of-the-art VLM baselines such as GPT-4.5 and Qwen2.5-VL-72B, achieving a 100% valid syntax rate and the highest accuracy in 3D solid similarity. Notably, our VLM demonstrates some signs of generalizability, successfully generating CAD code from real-world images and executing CAD operations unseen during fine-tuning. The performance and adaptability of CAD-Coder highlights the potential of VLMs fine-tuned on code to streamline CAD workflows for engineers and designers. CAD-Coder is publicly available at: https://github.com/anniedoris/CAD-Coder.

Related papers

CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design [10.105055422074734]
We introduce a new large-scale pipeline of more than 170k CAD models annotated with human-like descriptions.<n>Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design.
arXiv Detail & Related papers (2025-07-13T21:11:53Z)
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning [50.867869718716555]
We introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily.<n>We release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning.
arXiv Detail & Related papers (2025-05-31T13:52:56Z)
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.<n>We introduce a geometry encoder to accurately capture diverse geometric features.<n>Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z)
CAD-Recode: Reverse Engineering CAD Code from Point Clouds [12.864274930732055]
3D CAD reverse engineering consists of reconstructing the sketch and CAD operation sequences from 3D representations such as point clouds.<n>The proposed CAD-Recode translates a point cloud into Python code that, when executed, reconstructs the CAD model.<n>We show that our CAD Python code output is interpretable by off-the-shelf LLMs, enabling CAD editing and CAD-specific question answering from point clouds.
arXiv Detail & Related papers (2024-12-18T16:55:42Z)
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement [45.19076032719869]
We present BlenderLLM, a framework for training Large Language Models (LLMs) in Computer-Aided Design (CAD)<n>Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts.<n>Through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation.
arXiv Detail & Related papers (2024-12-16T14:34:02Z)
Text2CAD: Text to 3D CAD Generation via Technical Drawings [45.3611544056261]
Text2CAD is a novel framework that employs stable diffusion models tailored to automate the generation process. We show that Text2CAD effectively generates technical drawings that are accurately translated into high-quality 3D CAD models.
arXiv Detail & Related papers (2024-11-09T15:12:06Z)
Generating CAD Code with Vision-Language Models for 3D Designs [6.532952167132679]
We introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code.<n>Our approach works by producing ameliorative feedback by prompting a Vision-Language Model to generate and answer a set of validation questions.<n>Our findings show that CADCodeVerify improves VLM performance by providing visual feedback, enhancing the structure of the 3D objects, and increasing the success rate of the compiled program.
arXiv Detail & Related papers (2024-10-07T02:44:50Z)
GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors [3.796768352477804]
The creation of manufacturable and editable 3D shapes through Computer-Aided Design (CAD) remains a highly manual and time-consuming task.<n>This paper introduces GenCAD, a generative model that employs autoregressive transformers with a contrastive learning framework and latent diffusion models to transform image inputs into parametric CAD command sequences.
arXiv Detail & Related papers (2024-09-08T23:49:11Z)
OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design [1.481550828146527]
We fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B) OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands. These outputs can be directly used with existing CAD tools' APIs to generate project files.
arXiv Detail & Related papers (2024-06-14T10:47:52Z)
Geometric Deep Learning for Computer-Aided Design: A Survey [85.79012726689511]
This survey offers a comprehensive overview of learning-based methods in computer-aided design. It includes similarity analysis and retrieval, 2D and 3D CAD model synthesis, and CAD generation from point clouds. It provides a complete list of benchmark datasets and their characteristics, along with open-source codes that have propelled research in this domain.
arXiv Detail & Related papers (2024-02-27T17:11:35Z)
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.