Related papers: ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models

ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models

URL: http://arxiv.org/abs/2512.06328v1
Date: Sat, 06 Dec 2025 07:12:56 GMT
Title: ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models
Authors: Jiahao Li, Yusheng Luo, Yunzhong Lou, Xiangdong Zhou,
Abstract summary: ReCAD is a reinforcement learning (RL) framework that bootstraps pretrained large models (PLMs) to generate precise parametric computer-aided design (CAD) models from multimodal inputs.<n>We employ a hierarchical primitive learning process to teach structured and compositional skills under a unified reward function.<n>ReCAD sets a new state-of-the-art in both text-to-CAD and image-to-CAD tasks, significantly improving geometric accuracy across in-distribution and out-of-distribution settings.
Score: 16.220781575918256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present ReCAD, a reinforcement learning (RL) framework that bootstraps pretrained large models (PLMs) to generate precise parametric computer-aided design (CAD) models from multimodal inputs by leveraging their inherent generative capabilities. With just access to simple functional interfaces (e.g., point coordinates), our approach enables the emergence of complex CAD operations (e.g., pattern replication and mirror). This stands in contrast to previous methods, which typically rely on knowledge injected through supervised fine-tuning (SFT), offer limited support for editability, and fail to exploit the strong generative priors of PLMs. Specifically, the ReCAD framework begins by fine-tuning vision-language models (VLMs) to equip them with basic CAD model generation capabilities, where we rewrite CAD scripts into parameterized code that is leveraged to generate accurate textual descriptions for supervision. Then, we propose a novel RL strategy that incorporates parameterized code as guidance to enhance the model's reasoning on challenging questions. Furthermore, we employ a hierarchical primitive learning process to progressively teach structured and compositional skills under a unified reward function that ensures both geometric accuracy and semantic fidelity. ReCAD sets a new state-of-the-art in both text-to-CAD and image-to-CAD tasks, significantly improving geometric accuracy across in-distribution and out-of-distribution settings. In the image-to-CAD task, for instance, it reduces the mean Chamfer Distance from 73.47 to 29.61 (in-distribution) and from 272.06 to 80.23 (out-of-distribution), outperforming existing baselines by a substantial margin.

Related papers

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models [16.811723701941546]
We introduce novel preprocessing tailored for the graph-structured format of STEP.<n>We show consistent gains of our STEP-LLM in geometric fidelity over the Text2CAD baseline.
arXiv Detail & Related papers (2026-01-19T01:10:49Z)
CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation [30.08737988265254]
Existing methods that reconstruct 3D models from sketches often produce non-editable and approximate models.<n>We propose the Heterogeneous Collaborative Multi-Expert Reinforcement Learning (CME-CAD) paradigm, a novel training paradigm for CAD code generation.<n>We introduce a two-stage training process: Multi-Expert Fine-Tuning (MEFT), and Multi-Expert Reinforcement Learning (MERL)
arXiv Detail & Related papers (2025-12-29T09:37:53Z)
GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image [11.612167656421079]
Multi-modal large language models (MLLMs) still struggle with accurately inferring 3D geometry from 2D images.<n>We introduce GACO-CAD, a novel two-stage post-training framework.<n>Experiments on the DeepCAD and Fusion360 datasets show that GACO-CAD achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-10-20T04:57:20Z)
From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation [47.67703214044401]
We propose CAD-RL, a multimodal Chain-of-Thought guided reinforcement learning framework for CAD modeling code generation.<n>Our method combines Cold Start with goal-driven reinforcement learning post training using three task-specific rewards.<n>Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability.
arXiv Detail & Related papers (2025-08-13T18:30:49Z)
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning [31.342222156939403]
We introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily.<n>We release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning.
arXiv Detail & Related papers (2025-05-31T13:52:56Z)
cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning [55.16668009268005]
We propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities.<n>Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically.<n>In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously.
arXiv Detail & Related papers (2025-05-28T22:32:31Z)
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek [19.441404313543227]
This study is the first investigation to incorporate both visual and Chain-of-Thought (CoT) feedback within the self-refinement mechanism for generating CAD models.<n>We present an innovative 3D CAD model dataset structured around the SSR (Sketch, Sketch-based feature, and Refinements) triple design paradigm.
arXiv Detail & Related papers (2025-05-23T10:11:19Z)
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.<n>We introduce a geometry encoder to accurately capture diverse geometric features.<n>Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z)
CADSpotting: Robust Panoptic Symbol Spotting on Large-Scale CAD Drawings [56.05238657033198]
We introduce CADSpotting, an effective method for panoptic symbol spotting in large-scale architectural CAD drawings.<n>We also propose a novel Sliding Window Aggregation (SWA) technique that combines weighted voting and Non-Maximum Suppression (NMS)<n>Experiments on FloorPlanCAD and LS-CAD demonstrate that CADSpotting significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-10T10:22:17Z)
PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction [86.726941702182]
We introduce geometric guidance into the reconstruction network PS-CAD. We provide the geometry of surfaces where the current reconstruction differs from the complete model as a point cloud. Second, we use geometric analysis to extract a set of planar prompts, that correspond to candidate surfaces.
arXiv Detail & Related papers (2024-05-24T03:43:55Z)
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.