From Concept to Manufacturing: Evaluating Vision-Language Models for
Engineering Design
- URL: http://arxiv.org/abs/2311.12668v1
- Date: Tue, 21 Nov 2023 15:20:48 GMT
- Title: From Concept to Manufacturing: Evaluating Vision-Language Models for
Engineering Design
- Authors: Cyril Picard, Kristen M. Edwards, Anna C. Doris, Brandon Man, Giorgio
Giannone, Md Ferdous Alam, and Faez Ahmed
- Abstract summary: This paper presents a comprehensive evaluation of GPT-4V, a vision language model, across a wide spectrum of engineering design tasks.
Our study assesses GPT-4V's capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems.
- Score: 5.268919870502001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Engineering Design is undergoing a transformative shift with the advent of
AI, marking a new era in how we approach product, system, and service planning.
Large language models have demonstrated impressive capabilities in enabling
this shift. Yet, with text as their only input modality, they cannot leverage
the large body of visual artifacts that engineers have used for centuries and
are accustomed to. This gap is addressed with the release of multimodal vision
language models, such as GPT-4V, enabling AI to impact many more types of
tasks. In light of these advancements, this paper presents a comprehensive
evaluation of GPT-4V, a vision language model, across a wide spectrum of
engineering design tasks, categorized into four main areas: Conceptual Design,
System-Level and Detailed Design, Manufacturing and Inspection, and Engineering
Education Tasks. Our study assesses GPT-4V's capabilities in design tasks such
as sketch similarity analysis, concept selection using Pugh Charts, material
selection, engineering drawing analysis, CAD generation, topology optimization,
design for additive and subtractive manufacturing, spatial reasoning
challenges, and textbook problems. Through this structured evaluation, we not
only explore GPT-4V's proficiency in handling complex design and manufacturing
challenges but also identify its limitations in complex engineering design
applications. Our research establishes a foundation for future assessments of
vision language models, emphasizing their immense potential for innovating and
enhancing the engineering design and manufacturing landscape. It also
contributes a set of benchmark testing datasets, with more than 1000 queries,
for ongoing advancements and applications in this field.
Related papers
- DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation [3.3554851717552387]
This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation.
arXiv Detail & Related papers (2024-04-11T16:59:54Z) - Design2Code: How Far Are We From Automating Front-End Engineering? [83.06100360864502]
We formalize this as a Design2Code task and conduct comprehensive benchmarking.
Specifically, we manually curate a benchmark of 484 diverse real-world webpages as test cases.
We develop a suite of multimodal prompting methods and show their effectiveness on GPT-4V and Gemini Pro Vision.
Both human evaluation and automatic metrics show that GPT-4V performs the best on this task compared to other models.
arXiv Detail & Related papers (2024-03-05T17:56:27Z) - Geometric Deep Learning for Computer-Aided Design: A Survey [85.79012726689511]
This survey offers a comprehensive overview of learning-based methods in computer-aided design.
It includes similarity analysis and retrieval, 2D and 3D CAD model synthesis, and CAD generation from point clouds.
It provides a complete list of benchmark datasets and their characteristics, along with open-source codes that have propelled research in this domain.
arXiv Detail & Related papers (2024-02-27T17:11:35Z) - Gemini vs GPT-4V: A Preliminary Comparison and Combination of
Vision-Language Models Through Qualitative Cases [98.35348038111508]
This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision)
The core of our analysis delves into the distinct visual comprehension abilities of each model.
Our findings illuminate the unique strengths and niches of both models.
arXiv Detail & Related papers (2023-12-22T18:59:58Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z) - How Can Large Language Models Help Humans in Design and Manufacturing? [28.28959612862582]
Large Language Models (LLMs), including GPT-4, provide exciting new opportunities for generative design.
We scrutinize the utility of LLMs in tasks such as: converting a text-based prompt into a design specification, transforming a design into manufacturing instructions, producing a design space and design variations, computing the performance of a design, and searching for designs predicated on performance.
By exposing these limitations, we aspire to catalyze the continued improvement and progression of these models.
arXiv Detail & Related papers (2023-07-25T17:30:38Z) - Review of Large Vision Models and Visual Prompt Engineering [50.63394642549947]
Review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering.
We present influential large models in the visual domain and a range of prompt engineering methods employed on these models.
arXiv Detail & Related papers (2023-07-03T08:48:49Z) - Scaling Evidence-based Instructional Design Expertise through Large
Language Models [0.0]
This paper explores leveraging Large Language Models (LLMs), specifically GPT-4, in the field of instructional design.
With a focus on scaling evidence-based instructional design expertise, our research aims to bridge the gap between theoretical educational studies and practical implementation.
We discuss the benefits and limitations of AI-driven content generation, emphasizing the necessity of human oversight in ensuring the quality of educational materials.
arXiv Detail & Related papers (2023-05-31T17:54:07Z) - Design Space Exploration and Explanation via Conditional Variational
Autoencoders in Meta-model-based Conceptual Design of Pedestrian Bridges [52.77024349608834]
This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE)
The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland.
arXiv Detail & Related papers (2022-11-29T17:28:31Z) - Deep Generative Models in Engineering Design: A Review [1.933681537640272]
We present a review and analysis of Deep Generative Learning models in engineering design.
Recent DGMs have shown promising results in design applications like structural optimization, materials design, and shape synthesis.
arXiv Detail & Related papers (2021-10-21T02:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.