From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
- URL: http://arxiv.org/abs/2412.11892v2
- Date: Tue, 17 Dec 2024 04:38:28 GMT
- Title: From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
- Authors: Xilin Wang, Jia Zheng, Yuanchao Hu, Hao Zhu, Qian Yu, Zihan Zhou,
- Abstract summary: We present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings.
We treat the 2D CAD drawing as an image, regardless of its original format, and encode the image with a standard ViT model.
On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form.
- Score: 15.785592359384292
- License:
- Abstract: In this paper, we present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings. Our proposed method is inspired by recent successes in vision-language models (VLMs), and departs from traditional methods which rely on task-specific data representations and/or algorithms. Specifically, on the input side, we simply treat the 2D CAD drawing as a raster image, regardless of its original format, and encode the image with a standard ViT model. We show that such an encoding scheme achieves competitive performance against existing methods that operate on vector-graphics inputs, while imposing substantially fewer restrictions on the 2D drawings. On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form. Compared to other sequence modeling methods for CAD which use domain-specific sequence representations with fixed-size slots, our text-based representation is more flexible, and can be easily extended to arbitrary geometric entities and semantic or functional properties. Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.
Related papers
- Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry [12.265852643914439]
We present Img2CAD, the first knowledge that uses 2D image inputs to generate editable parameters.
Img2CAD enables seamless integration between AI 3D reconstruction and CAD representation.
arXiv Detail & Related papers (2024-10-04T13:27:52Z) - Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation [55.73399465968594]
This paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description.
Three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss.
arXiv Detail & Related papers (2024-04-02T11:03:24Z) - PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views
with Learnt Shape Programs [24.09764733540401]
We develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models.
We leverage the attention mechanism in a Transformer-based sequence generation model to learn flexible mappings between the input and output.
Our method significantly outperforms existing ones when the inputs are noisy or incomplete.
arXiv Detail & Related papers (2023-08-10T17:59:34Z) - Reconstructing editable prismatic CAD from rounded voxel models [16.03976415868563]
We introduce a novel neural network architecture to solve this challenging task.
Our method reconstructs the input geometry in the voxel space by decomposing the shape.
During inference, we obtain the CAD data by first searching a database of 2D constrained sketches.
arXiv Detail & Related papers (2022-09-02T16:44:10Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Interactive 3D Character Modeling from 2D Orthogonal Drawings with
Annotations [9.83187539596669]
We propose an interactive 3D character modeling approach from orthographic drawings based on 2D-space annotations.
The system builds partial correspondences between the input drawings and generates a base mesh with sweeping splines according to edge information in 2D images.
By repeating the 2D-space operations (i.e., revising and modifying the annotations), users can design a desired character model.
arXiv Detail & Related papers (2022-01-27T02:34:32Z) - Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval
from a Single Image [58.953160501596805]
We propose a novel approach towards constructing a joint embedding space between 2D images and 3D CAD models in a patch-wise fashion.
Our approach is more robust than state of the art in real-world scenarios without any exact CAD matches.
arXiv Detail & Related papers (2021-08-20T20:58:52Z) - Translational Symmetry-Aware Facade Parsing for 3D Building
Reconstruction [11.263458202880038]
In this paper, we present a novel translational symmetry-based approach to improving the deep neural networks.
We propose a novel scheme to fuse anchor-free detection in a single stage network, which enables the efficient training and better convergence.
We employ an off-the-shelf rendering engine like Blender to reconstruct the realistic high-quality 3D models using procedural modeling.
arXiv Detail & Related papers (2021-06-02T03:10:51Z) - Improved Modeling of 3D Shapes with Multi-view Depth Maps [48.8309897766904]
We present a general-purpose framework for modeling 3D shapes using CNNs.
Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects.
arXiv Detail & Related papers (2020-09-07T17:58:27Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.