DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
- URL: http://arxiv.org/abs/2404.07917v1
- Date: Thu, 11 Apr 2024 16:59:54 GMT
- Title: DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
- Authors: Anna C. Doris, Daniele Grandi, Ryan Tomich, Md Ferdous Alam, Hyunmin Cheong, Faez Ahmed,
- Abstract summary: This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation.
- Score: 3.3554851717552387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation. Developed with a focus on real-world engineering challenges, DesignQA uniquely combines multimodal data-including textual design requirements, CAD images, and engineering drawings-derived from the Formula SAE student competition. Different from many existing MLLM benchmarks, DesignQA contains document-grounded visual questions where the input image and input document come from different sources. The benchmark features automatic evaluation metrics and is divided into segments-Rule Comprehension, Rule Compliance, and Rule Extraction-based on tasks that engineers perform when designing according to requirements. We evaluate state-of-the-art models like GPT4 and LLaVA against the benchmark, and our study uncovers the existing gaps in MLLMs' abilities to interpret complex engineering documentation. Key findings suggest that while MLLMs demonstrate potential in navigating technical documents, substantial limitations exist, particularly in accurately extracting and applying detailed requirements to engineering designs. This benchmark sets a foundation for future advancements in AI-supported engineering design processes. DesignQA is publicly available at: https://github.com/anniedoris/design_qa/.
Related papers
- Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA [5.0108982850526]
Retrieval augmented generation (RAG) enhances the accuracy and reliability of generative AI models by sourcing factual information from external databases.
This paper proposes a customized RAG framework along with three domain-specific techniques for EDA tool documentation QA.
We have developed and released a documentation QA evaluation benchmark, ORD-QA, for OpenROAD, an advanced RTL-to-GDSII design platform.
arXiv Detail & Related papers (2024-07-22T03:44:27Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [85.51252685938564]
Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML)
As with other ML models, large language models (LLMs) are prone to make incorrect predictions, hallucinate'' by fabricating claims, or simply generate low-quality output for a given input.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We conduct extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - Automated User Story Generation with Test Case Specification Using Large Language Model [0.0]
We developed a tool "GeneUS" to automatically create user stories from requirements documents.
The output is provided in format leaving the possibilities open for downstream integration to the popular project management tools.
arXiv Detail & Related papers (2024-04-02T01:45:57Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over
Tabular and Textual Data [77.66158066013924]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - LLM4EDA: Emerging Progress in Large Language Models for Electronic
Design Automation [74.7163199054881]
Large Language Models (LLMs) have demonstrated their capability in context understanding, logic reasoning and answer generation.
We present a systematic study on the application of LLMs in the EDA field.
We highlight the future research direction, focusing on applying LLMs in logic synthesis, physical design, multi-modal feature extraction and alignment of circuits.
arXiv Detail & Related papers (2023-12-28T15:09:14Z) - From Concept to Manufacturing: Evaluating Vision-Language Models for
Engineering Design [5.268919870502001]
This paper presents a comprehensive evaluation of GPT-4V, a vision language model, across a wide spectrum of engineering design tasks.
Our study assesses GPT-4V's capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems.
arXiv Detail & Related papers (2023-11-21T15:20:48Z) - Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks.
We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z) - Natural Language Processing for Systems Engineering: Automatic
Generation of Systems Modelling Language Diagrams [0.10312968200748115]
An approach is proposed to assist systems engineers in the automatic generation of systems diagrams from unstructured natural language text.
The intention is to provide the users with a more standardised, comprehensive and automated starting point onto which subsequently refine and adapt the diagrams according to their needs.
arXiv Detail & Related papers (2022-08-09T19:20:33Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Engineering AI Systems: A Research Agenda [9.84673609667263]
We provide a conceptualization of the typical evolution patterns that companies experience when employing machine learning.
The main contribution of the paper is a research agenda for AI engineering that provides an overview of the key engineering challenges surrounding ML solutions.
arXiv Detail & Related papers (2020-01-16T20:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.