GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing
- URL: http://arxiv.org/abs/2509.15246v1
- Date: Wed, 17 Sep 2025 19:10:44 GMT
- Title: GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing
- Authors: Nomi Yu, Md Ferdous Alam, A. John Hart, Faez Ahmed,
- Abstract summary: We introduce GenCAD-3D, a multimodal generative framework to generate 3D CAD programs.<n>We also present SynthBal, a synthetic data augmentation strategy designed to balance and expand datasets.<n>Our experiments show that SynthBal significantly boosts reconstruction accuracy, reduces the generation of invalid CAD models, and markedly improves performance on high-complexity geometries.
- Score: 3.5539239472975583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: CAD programs, structured as parametric sequences of commands that compile into precise 3D geometries, are fundamental to accurate and efficient engineering design processes. Generating these programs from nonparametric data such as point clouds and meshes remains a crucial yet challenging task, typically requiring extensive manual intervention. Current deep generative models aimed at automating CAD generation are significantly limited by imbalanced and insufficiently large datasets, particularly those lacking representation for complex CAD programs. To address this, we introduce GenCAD-3D, a multimodal generative framework utilizing contrastive learning for aligning latent embeddings between CAD and geometric encoders, combined with latent diffusion models for CAD sequence generation and retrieval. Additionally, we present SynthBal, a synthetic data augmentation strategy specifically designed to balance and expand datasets, notably enhancing representation of complex CAD geometries. Our experiments show that SynthBal significantly boosts reconstruction accuracy, reduces the generation of invalid CAD models, and markedly improves performance on high-complexity geometries, surpassing existing benchmarks. These advancements hold substantial implications for streamlining reverse engineering and enhancing automation in engineering design. We will publicly release our datasets and code, including a set of 51 3D-printed and laser-scanned parts on our project site.
Related papers
- PLLM: Pseudo-Labeling Large Language Models for CAD Program Synthesis [16.542567548166968]
We introduce synthesisM, a self-training framework for CAD program synthesis from unlabeled 3D shapes.<n>Given a shape dataset, synthesisM iteratively samples candidate programs, selects high-fidelity executions, and augments programs to construct synthetic program-shape pairs for fine-tuning.<n>We experiment on adapting CAD-Recode from DeepCAD to the unlabeled ABC dataset show consistent improvements in geometric fidelity and program diversity.
arXiv Detail & Related papers (2026-02-13T03:20:19Z) - CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation [30.08737988265254]
Existing methods that reconstruct 3D models from sketches often produce non-editable and approximate models.<n>We propose the Heterogeneous Collaborative Multi-Expert Reinforcement Learning (CME-CAD) paradigm, a novel training paradigm for CAD code generation.<n>We introduce a two-stage training process: Multi-Expert Fine-Tuning (MEFT), and Multi-Expert Reinforcement Learning (MERL)
arXiv Detail & Related papers (2025-12-29T09:37:53Z) - Lemon: A Unified and Scalable 3D Multimodal Model for Universal Spatial Understanding [80.66591664266744]
Lemon is a unified transformer architecture that processes 3D point cloud patches and language tokens as a single sequence.<n>To handle the complexity of 3D data, we develop a structured patchification and tokenization scheme that preserves spatial context.<n>Lemon establishes new state-of-the-art performance across comprehensive 3D understanding and reasoning tasks.
arXiv Detail & Related papers (2025-12-14T20:02:43Z) - CADKnitter: Compositional CAD Generation from Text and Geometry Guidance [8.644079160190175]
We propose CADKnitter, a compositional CAD generation framework with a geometry-guided diffusion sampling strategy.<n>CADKnitter is able to generate a complementary CAD part that follows both the geometric constraints of the given CAD model and the semantic constraints of the desired design text prompt.<n>We also curate a dataset, so-called KnitCAD, containing over 310,000 samples of CAD models, along with textual prompts and assembly metadata.
arXiv Detail & Related papers (2025-12-12T01:06:38Z) - Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings [13.135028604324754]
Computer-Aided Design (CAD) generative modeling is driving significant innovations across industrial applications.<n>Recent works have shown remarkable progress in creating solid models from various inputs such as point clouds, meshes, and text descriptions.<n>These methods fundamentally diverge from traditional industrial drawings that begin with 2D engineering drawings.<n>The automatic generation of parametric CAD models from these 2D vector drawings remains underexplored despite being a critical step in engineering design.
arXiv Detail & Related papers (2025-08-26T07:01:58Z) - CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design [10.105055422074734]
We introduce a new large-scale pipeline of more than 170k CAD models annotated with human-like descriptions.<n>Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design.
arXiv Detail & Related papers (2025-07-13T21:11:53Z) - CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning [50.867869718716555]
We introduce CReFT-CAD, a two-stage fine-tuning paradigm that first employs a curriculum-driven reinforcement learning stage with difficulty-aware rewards to build reasoning ability steadily.<n>We release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning.
arXiv Detail & Related papers (2025-05-31T13:52:56Z) - GenCAD-Self-Repairing: Feasibility Enhancement for 3D CAD Generation [1.757434918993298]
GenCAD is a notable model in this domain, leveraging an autoregressive transformer-based architecture to generate CAD programs.<n>We propose GenCAD-Self-Repairing, a framework that enhances the feasibility of generative CAD models through diffusion guidance and a self-repairing pipeline.
arXiv Detail & Related papers (2025-05-29T09:39:19Z) - CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.<n>We introduce a geometry encoder to accurately capture diverse geometric features.<n>Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z) - PHT-CAD: Efficient CAD Parametric Primitive Analysis with Progressive Hierarchical Tuning [52.681829043446044]
ParaCAD comprises over 10 million annotated drawings for training and 3,000 real-world industrial drawings with complex topological structures and physical constraints for test.<n> PHT-CAD is a novel 2D PPA framework that harnesses the modality alignment and reasoning capabilities of Vision-Language Models.
arXiv Detail & Related papers (2025-03-23T17:24:32Z) - CADSpotting: Robust Panoptic Symbol Spotting on Large-Scale CAD Drawings [56.05238657033198]
We introduce CADSpotting, an effective method for panoptic symbol spotting in large-scale architectural CAD drawings.<n>We also propose a novel Sliding Window Aggregation (SWA) technique that combines weighted voting and Non-Maximum Suppression (NMS)<n>Experiments on FloorPlanCAD and LS-CAD demonstrate that CADSpotting significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-10T10:22:17Z) - GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors [3.796768352477804]
The creation of manufacturable and editable 3D shapes through Computer-Aided Design (CAD) remains a highly manual and time-consuming task.<n>This paper introduces GenCAD, a generative model that employs autoregressive transformers with a contrastive learning framework and latent diffusion models to transform image inputs into parametric CAD command sequences.
arXiv Detail & Related papers (2024-09-08T23:49:11Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.