From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design
- URL: http://arxiv.org/abs/2311.12668v2
- Date: Thu, 8 Aug 2024 02:13:06 GMT
- Title: From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design
- Authors: Cyril Picard, Kristen M. Edwards, Anna C. Doris, Brandon Man, Giorgio Giannone, Md Ferdous Alam, Faez Ahmed,
- Abstract summary: This paper presents a comprehensive evaluation of vision-language models (VLMs) across a spectrum of engineering design tasks.
Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems.
- Score: 5.268919870502001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision-language models (VLMs), such as GPT-4V, enabling AI to impact many more types of tasks. Our work presents a comprehensive evaluation of VLMs across a spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems. Through this structured evaluation, we not only explore VLMs' proficiency in handling complex design challenges but also identify their limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.
Related papers
- What matters when building vision-language models? [52.8539131958858]
We develop Idefics2, an efficient foundational vision-language model with 8 billion parameters.
Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks.
We release the model (base, instructed, and chat) along with the datasets created for its training.
arXiv Detail & Related papers (2024-05-03T17:00:00Z) - DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation [3.2169312784098705]
This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation.
DesignQA uniquely combines multimodal data-including textual design requirements, CAD images, and engineering drawings-derived from the Formula SAE student competition.
arXiv Detail & Related papers (2024-04-11T16:59:54Z) - Design2Code: How Far Are We From Automating Front-End Engineering? [83.06100360864502]
We formalize this as a Design2Code task and conduct comprehensive benchmarking.
Specifically, we manually curate a benchmark of 484 diverse real-world webpages as test cases.
We develop a suite of multimodal prompting methods and show their effectiveness on GPT-4V and Gemini Pro Vision.
Both human evaluation and automatic metrics show that GPT-4V performs the best on this task compared to other models.
arXiv Detail & Related papers (2024-03-05T17:56:27Z) - Geometric Deep Learning for Computer-Aided Design: A Survey [85.79012726689511]
This survey offers a comprehensive overview of learning-based methods in computer-aided design.
It includes similarity analysis and retrieval, 2D and 3D CAD model synthesis, and CAD generation from point clouds.
It provides a complete list of benchmark datasets and their characteristics, along with open-source codes that have propelled research in this domain.
arXiv Detail & Related papers (2024-02-27T17:11:35Z) - Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models [73.40350756742231]
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning.
Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored.
arXiv Detail & Related papers (2024-02-12T18:21:14Z) - Visual Instruction Tuning towards General-Purpose Multimodal Model: A
Survey [59.95153883166705]
Traditional computer vision generally solves each single task independently by a dedicated model with the task instruction implicitly designed in the model architecture.
Visual Instruction Tuning (VIT) has been intensively studied recently, which finetunes a large vision model with language as task instructions.
This work aims to provide a systematic review of visual instruction tuning, covering (1) the background that presents computer vision task paradigms and the development of VIT; (2) the foundations of VIT that introduce commonly used network architectures, visual instruction tuning frameworks and objectives, and evaluation setups and tasks; and (3) the commonly used datasets in visual instruction tuning and evaluation.
arXiv Detail & Related papers (2023-12-27T14:54:37Z) - How Can Large Language Models Help Humans in Design and Manufacturing? [28.28959612862582]
Large Language Models (LLMs), including GPT-4, provide exciting new opportunities for generative design.
We scrutinize the utility of LLMs in tasks such as: converting a text-based prompt into a design specification, transforming a design into manufacturing instructions, producing a design space and design variations, computing the performance of a design, and searching for designs predicated on performance.
By exposing these limitations, we aspire to catalyze the continued improvement and progression of these models.
arXiv Detail & Related papers (2023-07-25T17:30:38Z) - Challenges and Practices of Deep Learning Model Reengineering: A Case
Study on Computer Vision [3.510650664260664]
Many engineering organizations are reimplementing and extending deep neural networks from the research community.
Deep learning model reengineering is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing.
Our study is focused on reengineering activities from a "process" view, and focuses on engineers specifically engaged in the reengineering process.
arXiv Detail & Related papers (2023-03-13T21:23:43Z) - Design Space Exploration and Explanation via Conditional Variational
Autoencoders in Meta-model-based Conceptual Design of Pedestrian Bridges [52.77024349608834]
This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE)
The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland.
arXiv Detail & Related papers (2022-11-29T17:28:31Z) - Designing Machine Learning Toolboxes: Concepts, Principles and Patterns [0.0]
We provide an overview of key patterns in the design of AI modeling toolboxes.
Our analysis can not only explain the design of existing toolboxes, but also guide the development of new ones.
arXiv Detail & Related papers (2021-01-13T08:55:15Z) - Engineering AI Systems: A Research Agenda [9.84673609667263]
We provide a conceptualization of the typical evolution patterns that companies experience when employing machine learning.
The main contribution of the paper is a research agenda for AI engineering that provides an overview of the key engineering challenges surrounding ML solutions.
arXiv Detail & Related papers (2020-01-16T20:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.