From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
- URL: http://arxiv.org/abs/2508.10118v2
- Date: Mon, 18 Aug 2025 09:54:00 GMT
- Title: From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
- Authors: Ke Niu, Haiyang Yu, Zhuofan Chen, Mengyang Zhao, Teng Fu, Bin Li, Xiangyang Xue,
- Abstract summary: We propose CAD-RL, a multimodal Chain-of-Thought guided reinforcement learning framework for CAD modeling code generation.<n>Our method combines Cold Start with goal-driven reinforcement learning post training using three task-specific rewards.<n>Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability.
- Score: 47.67703214044401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computer-Aided Design (CAD) plays a vital role in engineering and manufacturing, yet current CAD workflows require extensive domain expertise and manual modeling effort. Recent advances in large language models (LLMs) have made it possible to generate code from natural language, opening new opportunities for automating parametric 3D modeling. However, directly translating human design intent into executable CAD code remains highly challenging, due to the need for logical reasoning, syntactic correctness, and numerical precision. In this work, we propose CAD-RL, a multimodal Chain-of-Thought (CoT) guided reinforcement learning post training framework for CAD modeling code generation. Our method combines CoT-based Cold Start with goal-driven reinforcement learning post training using three task-specific rewards: executability reward, geometric accuracy reward, and external evaluation reward. To ensure stable policy learning under sparse and high-variance reward conditions, we introduce three targeted optimization strategies: Trust Region Stretch for improved exploration, Precision Token Loss for enhanced dimensions parameter accuracy, and Overlong Filtering to reduce noisy supervision. To support training and benchmarking, we release ExeCAD, a noval dataset comprising 16,540 real-world CAD examples with paired natural language and structured design language descriptions, executable CADQuery scripts, and rendered 3D models. Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability over existing VLMs.
Related papers
- CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation [30.08737988265254]
Existing methods that reconstruct 3D models from sketches often produce non-editable and approximate models.<n>We propose the Heterogeneous Collaborative Multi-Expert Reinforcement Learning (CME-CAD) paradigm, a novel training paradigm for CAD code generation.<n>We introduce a two-stage training process: Multi-Expert Fine-Tuning (MEFT), and Multi-Expert Reinforcement Learning (MERL)
arXiv Detail & Related papers (2025-12-29T09:37:53Z) - ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models [16.220781575918256]
ReCAD is a reinforcement learning (RL) framework that bootstraps pretrained large models (PLMs) to generate precise parametric computer-aided design (CAD) models from multimodal inputs.<n>We employ a hierarchical primitive learning process to teach structured and compositional skills under a unified reward function.<n>ReCAD sets a new state-of-the-art in both text-to-CAD and image-to-CAD tasks, significantly improving geometric accuracy across in-distribution and out-of-distribution settings.
arXiv Detail & Related papers (2025-12-06T07:12:56Z) - RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base [112.72361202480154]
We present RAG-6DPose, a retrieval-augmented approach that leverages 3D CAD models as a knowledge base.<n> Experimental results on standard benchmarks and real-world robotic tasks demonstrate the effectiveness and robustness of our approach.
arXiv Detail & Related papers (2025-06-23T17:19:41Z) - CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning [50.867869718716555]
We introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily.<n>We release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning.
arXiv Detail & Related papers (2025-05-31T13:52:56Z) - cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning [55.16668009268005]
We propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities.<n>Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically.<n>In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously.
arXiv Detail & Related papers (2025-05-28T22:32:31Z) - Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek [19.441404313543227]
This study is the first investigation to incorporate both visual and Chain-of-Thought (CoT) feedback within the self-refinement mechanism for generating CAD models.<n>We present an innovative 3D CAD model dataset structured around the SSR (Sketch, Sketch-based feature, and Refinements) triple design paradigm.
arXiv Detail & Related papers (2025-05-23T10:11:19Z) - CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation [4.092348452904736]
This paper introduces CAD-Coder, an open-source Vision-Language Model (VLM) explicitly fine-tuned to generate editable CAD code (CadQuery Python) directly from visual input.<n>Leveraging a novel dataset that we created--GenCAD-Code, consisting of over 163k CAD-model image and code pairs--CAD-Coder outperforms state-of-the-art VLM baselines.
arXiv Detail & Related papers (2025-05-20T17:34:44Z) - CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.<n>We introduce a geometry encoder to accurately capture diverse geometric features.<n>Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z) - BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement [45.19076032719869]
We present BlenderLLM, a framework for training Large Language Models (LLMs) in Computer-Aided Design (CAD)<n>Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts.<n>Through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation.
arXiv Detail & Related papers (2024-12-16T14:34:02Z) - AutoCAD: Automatically Generating Counterfactuals for Mitigating
Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.