Assessing Large Language Models in Mechanical Engineering Education: A
Study on Mechanics-Focused Conceptual Understanding
- URL: http://arxiv.org/abs/2401.12983v1
- Date: Sat, 13 Jan 2024 19:19:04 GMT
- Title: Assessing Large Language Models in Mechanical Engineering Education: A
Study on Mechanics-Focused Conceptual Understanding
- Authors: Jie Tian, Jixin Hou, Zihao Wu, Peng Shu, Zhengliang Liu, Yujie Xiang,
Beikang Gu, Nicholas Filla, Yiwei Li, Ning Liu, Xianyan Chen, Keke Tang,
Tianming Liu, and Xianqiao Wang
- Abstract summary: This study investigates the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics.
Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1) were subjected to evaluation against engineering faculties and students with or without mechanical engineering background.
The findings reveal GPT-4's superior performance over the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics.
- Score: 25.769293445579816
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study is a pioneering endeavor to investigate the capabilities of Large
Language Models (LLMs) in addressing conceptual questions within the domain of
mechanical engineering with a focus on mechanics. Our examination involves a
manually crafted exam encompassing 126 multiple-choice questions, spanning
various aspects of mechanics courses, including Fluid Mechanics, Mechanical
Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of
Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5),
ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against
engineering faculties and students with or without mechanical engineering
background. The findings reveal GPT-4's superior performance over the other two
LLMs and human cohorts in answering questions across various mechanics topics,
except for Continuum Mechanics. This signals the potential future improvements
for GPT models in handling symbolic calculations and tensor analyses. The
performances of LLMs were all significantly improved with explanations prompted
prior to direct responses, underscoring the crucial role of prompt engineering.
Interestingly, GPT-3.5 demonstrates improved performance with prompts covering
a broader domain, while GPT-4 excels with prompts focusing on specific
subjects. Finally, GPT-4 exhibits notable advancements in mitigating input
bias, as evidenced by guessing preferences for humans. This study unveils the
substantial potential of LLMs as highly knowledgeable assistants in both
mechanical pedagogy and scientific research.
Related papers
- Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case
Study [31.243696199790413]
Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant.
The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals.
The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities.
arXiv Detail & Related papers (2024-01-04T08:53:08Z) - A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual
Question Answering [56.01977227584777]
multimodal large models (MLMs) has significantly advanced the field of visual understanding.
Yet, the true challenge lies in the domain of knowledge-intensive visual question answering (VQA) tasks.
This study provides an in-depth evaluation of the newly introduced GPT-4V.
arXiv Detail & Related papers (2023-11-13T18:22:32Z) - Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review [1.6006550105523192]
The paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs)
This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting.
We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods.
arXiv Detail & Related papers (2023-10-23T09:15:18Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z) - ChatGPT & Mechanical Engineering: Examining performance on the FE
Mechanical Engineering and Undergraduate Exams [0.0]
This study examines the capabilities of ChatGPT within the discipline of mechanical engineering.
It aims to examine use cases and pitfalls of such a technology in the classroom and professional settings.
arXiv Detail & Related papers (2023-09-26T20:12:26Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - Performance of ChatGPT on the US Fundamentals of Engineering Exam:
Comprehensive Assessment of Proficiency and Potential Implications for
Professional Environmental Engineering Practice [0.0]
This study investigates the feasibility and effectiveness of using ChatGPT, a GPT-4 based model, in achieving satisfactory performance on the Fundamentals of Engineering (FE) Environmental Exam.
The findings reflect remarkable improvements in mathematical capabilities across successive iterations of ChatGPT models, showcasing their potential in solving complex engineering problems.
arXiv Detail & Related papers (2023-04-20T16:54:34Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - Summary of ChatGPT-Related Research and Perspective Towards the Future
of Large Language Models [40.557611946967086]
This paper presents a survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains.
We performed an in-depth analysis of 194 relevant papers on arXiv, encompassing trend analysis, word cloud representation, and distribution analysis across various application domains.
arXiv Detail & Related papers (2023-04-04T15:01:06Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Understanding Attention in Machine Reading Comprehension [56.72165932439117]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final performance.
We perform quantitative analyses on SQuAD (English) and CMRC 2018 (Chinese), two span-extraction MRC datasets, on top of BERT, ALBERT, and ELECTRA.
We discover that em passage-to-question and em passage understanding attentions are the most important ones, showing strong correlations to the final performance.
arXiv Detail & Related papers (2021-08-26T04:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.