Text2VP: Generative AI for Visual Programming and Parametric Modeling
- URL: http://arxiv.org/abs/2407.07732v2
- Date: Mon, 19 May 2025 04:05:01 GMT
- Title: Text2VP: Generative AI for Visual Programming and Parametric Modeling
- Authors: Guangxi Feng, Wei Yan,
- Abstract summary: This study introduces Text-to-Visual Programming (Text2VP) GPT, a novel generative AI derived from GPT-4.1.<n>Tests demonstrate Text2VP's capability in generating functional parametric models, although higher complexity models present increased error rates.<n>Ultimately, Text2VP aims to enable designers to easily create and modify parametric models without extensive training in specialized platforms like Grasshopper.
- Score: 6.531561475204309
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The integration of generative artificial intelligence (AI) into architectural design has advanced significantly, enabling the generation of text, images, and 3D models. However, prior AI applications lack support for text-to-parametric models, essential for generating and optimizing diverse parametric design options. This study introduces Text-to-Visual Programming (Text2VP) GPT, a novel generative AI derived from GPT-4.1, designed to automate graph-based visual programming workflows, parameters, and their interconnections. Text2VP leverages detailed documentation, specific instructions, and example-driven few-shot learning to reflect user intentions accurately and facilitate interactive parameter adjustments. Testing demonstrates Text2VP's capability in generating functional parametric models, although higher complexity models present increased error rates. This research highlights generative AI's potential in visual programming and parametric modeling, laying groundwork for future improvements to manage complex modeling tasks. Ultimately, Text2VP aims to enable designers to easily create and modify parametric models without extensive training in specialized platforms like Grasshopper.
Related papers
- URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model [76.08429266631823]
We propose an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM)<n>URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction.<n> Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches.
arXiv Detail & Related papers (2025-11-02T13:45:51Z) - VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions [11.210768330027674]
We introduce VEHME-a Vision-Language Model for evaluating handwritten math responses with high accuracy and interpretable reasoning traces.<n> VEHME integrates a two-phase training pipeline: (i) supervised fine-tuning using structured reasoning data, and (ii) reinforcement learning that aligns model outputs with multi-dimensional grading objectives.<n> VEHME achieves state-of-the-art performance among open-source models and approaches the accuracy of proprietary systems.
arXiv Detail & Related papers (2025-10-26T19:03:27Z) - TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics [53.442362491589726]
We present TIGeR (Tool-Integrated Geometric Reasoning), a novel framework that transforms Vision-Language Models (VLMs) into geometric computers.<n>Rather than attempting to internalize complex geometric operations within neural networks, TIGeR empowers models to recognize geometric reasoning requirements.<n>We show that TIGeR achieves SOTA performance on geometric reasoning benchmarks while demonstrating centimeter-level precision in real-world robotic manipulation tasks.
arXiv Detail & Related papers (2025-10-08T16:20:23Z) - A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models [2.171120568435925]
Large Language Models (LLMs) for code require significant computational resources for training and fine-tuning.
To address this, the research community has increasingly turned to Efficient Fine-Tuning (PEFT)
PEFT enables the adaptation of large models by updating only a small subset of parameters, rather than the entire model.
Our study synthesizes findings from 27 peer-reviewed papers, identifying patterns in configuration strategies and adaptation trade-offs.
arXiv Detail & Related papers (2025-04-29T16:19:25Z) - Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis [9.900586490845694]
This paper introduces a generative model designed for multimodal control over text-to-image foundation generative AI models such as Stable Diffusion.<n>Our model proposes parametric, image, and text control modalities to enhance design precision and diversity.
arXiv Detail & Related papers (2024-12-06T01:40:10Z) - Generative Design through Quality-Diversity Data Synthesis and Language Models [5.196236145367301]
Two fundamental challenges face generative models in engineering applications: the acquisition of high-performing, diverse datasets, and the adherence to precise constraints in generated designs.
We propose a novel approach combining optimization, constraint satisfaction, and language models to tackle these challenges in architectural design.
arXiv Detail & Related papers (2024-05-16T11:30:08Z) - Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation [25.317788211120362]
We investigate the role of generative AI models in providing human tutor-style programming hints.
Recent works have benchmarked state-of-the-art models for various feedback generation scenarios.
We develop a novel technique, GPT4Hints-GPT3.5Val, to push the limits of generative AI models.
arXiv Detail & Related papers (2023-10-05T17:02:59Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [66.85125112199898]
We develop a unified language interface for computer vision tasks that abstracts away task-specific design choices.
Our model, dubbed InstructCV, performs competitively compared to other generalist and task-specific vision models.
arXiv Detail & Related papers (2023-09-30T14:26:43Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Large Language and Text-to-3D Models for Engineering Design Optimization [0.1740313383876245]
We study the potential of deep text-to-3D models in the engineering domain.
We use Shap-E, a text-to-3D asset network by OpenAI, in the context of aerodynamic vehicle optimization.
arXiv Detail & Related papers (2023-07-03T07:54:09Z) - BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer [77.28871523946418]
BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University.
It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio.
arXiv Detail & Related papers (2023-07-01T15:10:01Z) - Visual Programming for Text-to-Image Generation and Evaluation [73.12069620086311]
We propose two novel interpretable/explainable visual programming frameworks for text-to-image (T2I) generation and evaluation.
First, we introduce VPGen, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation.
Second, we introduce VPEval, an interpretable and explainable evaluation framework for T2I generation based on visual programming.
arXiv Detail & Related papers (2023-05-24T16:42:17Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - Beyond Statistical Similarity: Rethinking Metrics for Deep Generative
Models in Engineering Design [10.531935694354448]
This paper doubles as a review and practical guide to evaluation metrics for deep generative models (DGMs) in engineering design.
We first summarize the well-accepted classic' evaluation metrics for deep generative models grounded in machine learning theory.
Next, we curate a set of design-specific metrics which can be used for evaluating deep generative models.
arXiv Detail & Related papers (2023-02-06T16:34:16Z) - An Overview on Controllable Text Generation via Variational
Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans.
Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data.
This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.