cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning
- URL: http://arxiv.org/abs/2505.22914v1
- Date: Wed, 28 May 2025 22:32:31 GMT
- Title: cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning
- Authors: Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich,
- Abstract summary: We propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities.<n>Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically.<n>In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously.
- Score: 41.24641565316878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms such as Group Relative Preference Optimization (GRPO) outperform offline alternatives. In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously. More importantly, after RL fine-tuning, cadrille sets new state-of-the-art on three challenging datasets, including a real-world one.
Related papers
- Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning [50.867869718716555]
We introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily.<n>We release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning.
arXiv Detail & Related papers (2025-05-31T13:52:56Z) - Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek [19.441404313543227]
This study is the first investigation to incorporate both visual and Chain-of-Thought (CoT) feedback within the self-refinement mechanism for generating CAD models.<n>We present an innovative 3D CAD model dataset structured around the SSR (Sketch, Sketch-based feature, and Refinements) triple design paradigm.
arXiv Detail & Related papers (2025-05-23T10:11:19Z) - CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation [16.212242362122947]
This study investigates the generation of parametric sequences for computer-aided design (CAD) models using Large Language Models (LLMs)<n>We present CAD-Llama, a framework designed to enhance pretrained LLMs for generating parametric 3D CAD models.
arXiv Detail & Related papers (2025-05-07T14:52:02Z) - Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.<n>Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.<n>We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks.
We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities.
We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - Optimizing CAD Models with Latent Space Manipulation [4.180840853105103]
We extend StyleCLIP to work with CAD models in the form of voxel models.
We demonstrate the ability of our system for the optimiziation of automation-related features by optimizing the grabability of various CAD models.
arXiv Detail & Related papers (2023-03-09T08:25:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.