EvoCAD: Evolutionary CAD Code Generation with Vision Language Models
- URL: http://arxiv.org/abs/2510.11631v1
- Date: Mon, 13 Oct 2025 17:12:02 GMT
- Title: EvoCAD: Evolutionary CAD Code Generation with Vision Language Models
- Authors: Tobias Preintner, Weixuan Yuan, Adrian König, Thomas Bäck, Elena Raponi, Niki van Stein,
- Abstract summary: EvoCAD is a method for generating computer-aided design (CAD) objects through their symbolic representations.<n>We introduce two new metrics based on topological properties defined by the Euler characteristic, which capture a form of semantic similarity between 3D objects.
- Score: 1.9233158329692603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combining large language models with evolutionary computation algorithms represents a promising research direction leveraging the remarkable generative and in-context learning capabilities of LLMs with the strengths of evolutionary algorithms. In this work, we present EvoCAD, a method for generating computer-aided design (CAD) objects through their symbolic representations using vision language models and evolutionary optimization. Our method samples multiple CAD objects, which are then optimized using an evolutionary approach with vision language and reasoning language models. We assess our method using GPT-4V and GPT-4o, evaluating it on the CADPrompt benchmark dataset and comparing it to prior methods. Additionally, we introduce two new metrics based on topological properties defined by the Euler characteristic, which capture a form of semantic similarity between 3D objects. Our results demonstrate that EvoCAD outperforms previous approaches on multiple metrics, particularly in generating topologically correct objects, which can be efficiently evaluated using our two novel metrics that complement existing spatial metrics.
Related papers
- VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation [7.044221981512693]
Vision Foundation Models (VFMs) and Vision Language Models (VLMs) have revolutionized computer vision by providing rich semantic and geometric representations.<n>This paper presents a comprehensive visual comparison between CLIP based and DINOv2 based approaches for 3D pose estimation in hand object grasping scenarios.
arXiv Detail & Related papers (2025-12-08T06:54:16Z) - Hierarchical Neural Semantic Representation for 3D Semantic Correspondence [72.8101601086805]
We design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features.<n>Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature.<n>Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories.
arXiv Detail & Related papers (2025-09-22T07:23:07Z) - Active Learning and Explainable AI for Multi-Objective Optimization of Spin Coated Polymers [0.1486780669929473]
Spin coating polymer thin films to achieve specific mechanical properties is inherently a multi-objective optimization problem.<n>We present a framework that integrates an active Pareto front learning algorithm (PyePAL) with visualization and explainable AI techniques to optimize processing parameters.
arXiv Detail & Related papers (2025-09-10T20:35:59Z) - Human-in-the-Loop: Quantitative Evaluation of 3D Models Generation by Large Language Models [0.0]
This paper introduces a human in the loop framework for the quantitative evaluation of Large Language Models generated 3D models.<n>We propose a comprehensive suite of similarity and complexity metrics, including volumetric accuracy, surface alignment, dimensional fidelity, and topological intricacy.<n>Our findings demonstrate improved generation fidelity with increased semantic richness, with code level prompts achieving perfect reconstruction.
arXiv Detail & Related papers (2025-09-06T11:04:15Z) - E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic Expressions [0.33748750222488655]
We introduce E-Gen, a novel e-graph-based dataset generation scheme that synthesizes large and diverse mathematical expression datasets.<n>We train embedding models using two strategies: generating mathematically equivalent expressions, and contrastive learning to explicitly group equivalent expressions.<n>We demonstrate that our embedding-based approach outperforms state-of-the-art large language models on several tasks.
arXiv Detail & Related papers (2025-01-24T22:39:08Z) - From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach [15.785592359384292]
We present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings.<n>We treat the 2D CAD drawing as an image, regardless of its original format, and encode the image with a standard ViT model.<n>On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form.
arXiv Detail & Related papers (2024-12-16T15:41:14Z) - The Geometry of Self-supervised Learning Models and its Impact on
Transfer Learning [62.601681746034956]
Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision.
We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each.
arXiv Detail & Related papers (2022-09-18T18:15:38Z) - UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes [91.24112204588353]
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks.
In contrast to previous models, UViM has the same functional form for all tasks.
We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks.
arXiv Detail & Related papers (2022-05-20T17:47:59Z) - UnProjection: Leveraging Inverse-Projections for Visual Analytics of
High-Dimensional Data [63.74032987144699]
We present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping.
NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system.
arXiv Detail & Related papers (2021-11-02T17:11:57Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.