GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image
- URL: http://arxiv.org/abs/2510.17157v1
- Date: Mon, 20 Oct 2025 04:57:20 GMT
- Title: GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image
- Authors: Yinghui Wang, Xinyu Zhang, Peng Du,
- Abstract summary: Multi-modal large language models (MLLMs) still struggle with accurately inferring 3D geometry from 2D images.<n>We introduce GACO-CAD, a novel two-stage post-training framework.<n>Experiments on the DeepCAD and Fusion360 datasets show that GACO-CAD achieves state-of-the-art performance.
- Score: 11.612167656421079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating editable, parametric CAD models from a single image holds great potential to lower the barriers of industrial concept design. However, current multi-modal large language models (MLLMs) still struggle with accurately inferring 3D geometry from 2D images due to limited spatial reasoning capabilities. We address this limitation by introducing GACO-CAD, a novel two-stage post-training framework. It is designed to achieve a joint objective: simultaneously improving the geometric accuracy of the generated CAD models and encouraging the use of more concise modeling procedures. First, during supervised fine-tuning, we leverage depth and surface normal maps as dense geometric priors, combining them with the RGB image to form a multi-channel input. In the context of single-view reconstruction, these priors provide complementary spatial cues that help the MLLM more reliably recover 3D geometry from 2D observations. Second, during reinforcement learning, we introduce a group length reward that, while preserving high geometric fidelity, promotes the generation of more compact and less redundant parametric modeling sequences. A simple dynamic weighting strategy is adopted to stabilize training. Experiments on the DeepCAD and Fusion360 datasets show that GACO-CAD achieves state-of-the-art performance under the same MLLM backbone, consistently outperforming existing methods in terms of code validity, geometric accuracy, and modeling conciseness.
Related papers
- CADKnitter: Compositional CAD Generation from Text and Geometry Guidance [8.644079160190175]
We propose CADKnitter, a compositional CAD generation framework with a geometry-guided diffusion sampling strategy.<n>CADKnitter is able to generate a complementary CAD part that follows both the geometric constraints of the given CAD model and the semantic constraints of the desired design text prompt.<n>We also curate a dataset, so-called KnitCAD, containing over 310,000 samples of CAD models, along with textual prompts and assembly metadata.
arXiv Detail & Related papers (2025-12-12T01:06:38Z) - ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models [16.220781575918256]
ReCAD is a reinforcement learning (RL) framework that bootstraps pretrained large models (PLMs) to generate precise parametric computer-aided design (CAD) models from multimodal inputs.<n>We employ a hierarchical primitive learning process to teach structured and compositional skills under a unified reward function.<n>ReCAD sets a new state-of-the-art in both text-to-CAD and image-to-CAD tasks, significantly improving geometric accuracy across in-distribution and out-of-distribution settings.
arXiv Detail & Related papers (2025-12-06T07:12:56Z) - GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation [68.02988074681427]
Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content.<n>In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models.<n>Our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2025-11-28T13:55:45Z) - Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z) - CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data.<n>We introduce a geometry encoder to accurately capture diverse geometric features.<n>Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z) - CADDreamer: CAD Object Generation from Single-view Images [43.59340035126575]
Existing 3D generative models often produce overly dense and unstructured meshes.<n>We introduce CADDreamer, a novel approach for generating boundary representations (B-rep) of CAD objects from a single image.<n>Results demonstrate that our method effectively recovers high-quality CAD objects from single-view images.
arXiv Detail & Related papers (2025-02-28T05:30:29Z) - CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs [15.505120320280007]
This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM.<n>It maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism.<n>It also discretizes 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations.
arXiv Detail & Related papers (2024-12-27T14:19:36Z) - GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images.
We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization.
Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.