Related papers: CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

URL: http://arxiv.org/abs/2412.19663v1
Date: Fri, 27 Dec 2024 14:19:36 GMT
Title: CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Authors: Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, Jie Yang,
Abstract summary: This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM.<n>It maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism.<n>It also discretizes 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations.
Score: 15.505120320280007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain and costly to store. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise spatial inference, our approach introduces a 3D Modeling Spatial Mechanism. This method maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism, while discretizing 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations. Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively.

Related papers

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [69.7768227804928]
CADCrafter is an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data. We introduce a geometry encoder to accurately capture diverse geometric features. Our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
arXiv Detail & Related papers (2025-04-07T06:01:35Z)
GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting [17.173300756707917]
Existing 3D reconstruction methods typically require natural images and corresponding camera poses as inputs. We transform CAD sketches into representations resembling natural images and extract corresponding masks. We employ a customized sparse-view 3D reconstruction method to achieve high-quality reconstructions from aligned orthographic views.
arXiv Detail & Related papers (2025-03-07T05:55:50Z)
CADDreamer: CAD Object Generation from Single-view Images [43.59340035126575]
Existing 3D generative models often produce overly dense and unstructured meshes. We introduce CADDreamer, a novel approach for generating boundary representations (B-rep) of CAD objects from a single image. Results demonstrate that our method effectively recovers high-quality CAD objects from single-view images.
arXiv Detail & Related papers (2025-02-28T05:30:29Z)
CADSpotting: Robust Panoptic Symbol Spotting on Large-Scale CAD Drawings [21.512025767558498]
CADSpotting represents primitives through densely sampled points with attributes like coordinates and colors. We propose a novel Sliding Window Aggregation (SWA) technique, combining weighted voting and Non-Maximum Suppression (NMS) Experiments on FloorPlanCAD and LS-CAD datasets show that CADSpotting significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-10T10:22:17Z)
Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry [12.265852643914439]
We present Img2CAD, the first knowledge that uses 2D image inputs to generate editable parameters. Img2CAD enables seamless integration between AI 3D reconstruction and CAD representation.
arXiv Detail & Related papers (2024-10-04T13:27:52Z)
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation [52.772319840580074]
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation. We introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
arXiv Detail & Related papers (2024-03-27T04:09:34Z)
NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs. We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels. We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z)
PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs [24.09764733540401]
We develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models. We leverage the attention mechanism in a Transformer-based sequence generation model to learn flexible mappings between the input and output. Our method significantly outperforms existing ones when the inputs are noisy or incomplete.
arXiv Detail & Related papers (2023-08-10T17:59:34Z)
NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering. Our approach rests on insights in learning a category-level shape prior directly from real driving scenes. We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z)
CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z)
SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations [21.000539206470897]
SECAD-Net is an end-to-end neural network aimed at reconstructing compact and easy-to-edit CAD models. We show superiority over state-of-the-art alternatives including the closely related method for supervised CAD reconstruction.
arXiv Detail & Related papers (2023-03-19T09:26:03Z)
ROCA: Robust CAD Model Retrieval and Alignment from a Single Image [22.03752392397363]
We present ROCA, a novel end-to-end approach that retrieves and aligns 3D CAD models from a shape database to a single input image. experiments on challenging, real-world imagery from ScanNet show that ROCA significantly improves on state of the art, from 9.5% to 17.6% in retrieval-aware CAD alignment accuracy.
arXiv Detail & Related papers (2021-12-03T16:02:32Z)
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion. Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z)
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image. We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose. This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)
CAD-Deform: Deformable Fitting of CAD Models to 3D Scans [30.451330075135076]
We introduce CAD-Deform, a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models. A series of experiments demonstrate that our method achieves significantly tighter scan-to-CAD fits, allowing a more accurate digital replica of the scanned real-world environment.
arXiv Detail & Related papers (2020-07-23T12:30:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.