Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model Capabilities
- URL: http://arxiv.org/abs/2505.06507v1
- Date: Sat, 10 May 2025 04:47:08 GMT
- Title: Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model Capabilities
- Authors: Haoyang Xie, Feng Ju,
- Abstract summary: Computer-aided design (CAD) is fundamental to modern engineering and manufacturing, but creating CAD models still requires expert knowledge and specialized software.<n>Recent advances in large language models (LLMs) open up the possibility of generative CAD, where natural language is directly translated into parametric 3D models.<n>We propose generating CadQuery code directly from text, leveraging the strengths of pretrained LLMs to produce 3D models without intermediate representations.
- Score: 4.093726588615417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-aided design (CAD) is fundamental to modern engineering and manufacturing, but creating CAD models still requires expert knowledge and specialized software. Recent advances in large language models (LLMs) open up the possibility of generative CAD, where natural language is directly translated into parametric 3D models. However, most existing methods generate task-specific command sequences that pretrained models cannot directly handle. These sequences must be converted into CAD representations such as CAD vectors before a 3D model can be produced, which requires training models from scratch and adds unnecessary complexity. To tackle this issue, we propose generating CadQuery code directly from text, leveraging the strengths of pretrained LLMs to produce 3D models without intermediate representations, using this Python-based scripting language. Since LLMs already excel at Python generation and spatial reasoning, fine-tuning them on Text-to-CadQuery data proves highly effective. Given that these capabilities typically improve with scale, we hypothesize that larger models will perform better after fine-tuning. To enable this, we augment the Text2CAD dataset with 170,000 CadQuery annotations. We fine-tune six open-source LLMs of varying sizes and observe consistent improvements. Our best model achieves a top-1 exact match of 69.3%, up from 58.8%, and reduces Chamfer Distance by 48.6%. Project page: https://github.com/Text-to-CadQuery/Text-to-CadQuery.
Related papers
- CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design [10.105055422074734]
We introduce a new large-scale pipeline of more than 170k CAD models annotated with human-like descriptions.<n>Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design.
arXiv Detail & Related papers (2025-07-13T21:11:53Z) - CAD-Recode: Reverse Engineering CAD Code from Point Clouds [12.864274930732055]
3D CAD reverse engineering consists of reconstructing the sketch and CAD operation sequences from 3D representations such as point clouds.<n>The proposed CAD-Recode translates a point cloud into Python code that, when executed, reconstructs the CAD model.<n>We show that our CAD Python code output is interpretable by off-the-shelf LLMs, enabling CAD editing and CAD-specific question answering from point clouds.
arXiv Detail & Related papers (2024-12-18T16:55:42Z) - BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement [45.19076032719869]
We present BlenderLLM, a framework for training Large Language Models (LLMs) in Computer-Aided Design (CAD)<n>Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts.<n>Through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation.
arXiv Detail & Related papers (2024-12-16T14:34:02Z) - FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models [22.010338370150738]
We propose FlexCAD, a unified model by fine-tuning large language models (LLMs)<n>We represent a CAD model as a structured text by abstracting each hierarchy as a sequence of text tokens.<n>During inference, the user intent is converted into a CAD text with a mask token replacing the part the user wants to modify.
arXiv Detail & Related papers (2024-11-05T05:45:26Z) - Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts [12.63158811936688]
We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models.
Our proposed framework shows great potential in AI-aided design applications.
arXiv Detail & Related papers (2024-09-25T17:19:33Z) - SuperGaussian: Repurposing Video Models for 3D Super Resolution [67.19266415499139]
We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details.
We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution.
arXiv Detail & Related papers (2024-06-02T03:44:50Z) - GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence [64.77224422330737]
GigaPose is a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images.
Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three.
It achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods.
arXiv Detail & Related papers (2023-11-23T18:55:03Z) - Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training [105.3421541518582]
Current successful methods of 3D scene perception rely on the large-scale annotated point cloud.
We propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08% and 55.49% on the ScanNet and S3DIS datasets, respectively.
arXiv Detail & Related papers (2023-09-29T03:51:26Z) - Prompt2Model: Generating Deployable Models from Natural Language
Instructions [74.19816829003729]
Large language models (LLMs) enable system builders to create competent NLP systems through prompting.
In other ways, LLMs are a step backward from traditional special-purpose NLP models.
We propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs.
arXiv Detail & Related papers (2023-08-23T17:28:21Z) - CAD-Estate: Large-scale CAD Model Annotation in RGB Videos [34.63782303927944]
We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects.
We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation.
Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor.
arXiv Detail & Related papers (2023-06-15T10:12:02Z) - LongForm: Effective Instruction Tuning with Reverse Instructions [74.14035528786997]
We introduce the LongForm-C dataset, which is created by reverse instructions.
We generate instructions via LLMs for human-written corpus examples using reverse instructions.
Our models outperform 10x larger language models without instruction tuning on tasks such as story/recipe generation and long-form question answering.
arXiv Detail & Related papers (2023-04-17T17:36:35Z) - Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ [118.04625413322827]
$texttt5x$ and $texttseqio$ are open source software libraries for building and training language models.
These libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
arXiv Detail & Related papers (2022-03-31T17:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.