Assessing Large Language Models in Mechanical Engineering Education: A
Study on Mechanics-Focused Conceptual Understanding
- URL: http://arxiv.org/abs/2401.12983v1
- Date: Sat, 13 Jan 2024 19:19:04 GMT
- Title: Assessing Large Language Models in Mechanical Engineering Education: A
Study on Mechanics-Focused Conceptual Understanding
- Authors: Jie Tian, Jixin Hou, Zihao Wu, Peng Shu, Zhengliang Liu, Yujie Xiang,
Beikang Gu, Nicholas Filla, Yiwei Li, Ning Liu, Xianyan Chen, Keke Tang,
Tianming Liu, and Xianqiao Wang
- Abstract summary: This study investigates the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics.
Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1) were subjected to evaluation against engineering faculties and students with or without mechanical engineering background.
The findings reveal GPT-4's superior performance over the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics.
- Score: 25.769293445579816
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study is a pioneering endeavor to investigate the capabilities of Large
Language Models (LLMs) in addressing conceptual questions within the domain of
mechanical engineering with a focus on mechanics. Our examination involves a
manually crafted exam encompassing 126 multiple-choice questions, spanning
various aspects of mechanics courses, including Fluid Mechanics, Mechanical
Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of
Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5),
ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against
engineering faculties and students with or without mechanical engineering
background. The findings reveal GPT-4's superior performance over the other two
LLMs and human cohorts in answering questions across various mechanics topics,
except for Continuum Mechanics. This signals the potential future improvements
for GPT models in handling symbolic calculations and tensor analyses. The
performances of LLMs were all significantly improved with explanations prompted
prior to direct responses, underscoring the crucial role of prompt engineering.
Interestingly, GPT-3.5 demonstrates improved performance with prompts covering
a broader domain, while GPT-4 excels with prompts focusing on specific
subjects. Finally, GPT-4 exhibits notable advancements in mitigating input
bias, as evidenced by guessing preferences for humans. This study unveils the
substantial potential of LLMs as highly knowledgeable assistants in both
mechanical pedagogy and scientific research.
Related papers
- VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning [32.811840681428464]
Multi-modal large language models (MLLMs) have demonstrated promising capabilities across various tasks.
We present a detailed evaluation of the performance of 25 representative MLLMs in scientific reasoning.
The best performance observed include a 53.4% accuracy in mathematics by Claude3.5-Sonnet, 38.2% in physics by GPT-4o, and 47.0% in chemistry by Gemini-1.5-Pro.
arXiv Detail & Related papers (2024-09-10T01:20:26Z) - Recent Advances on Machine Learning for Computational Fluid Dynamics: A Survey [51.87875066383221]
This paper introduces fundamental concepts, traditional methods, and benchmark datasets, then examine the various roles Machine Learning plays in improving CFD.
We highlight real-world applications of ML for CFD in critical scientific and engineering disciplines, including aerodynamics, combustion, atmosphere & ocean science, biology fluid, plasma, symbolic regression, and reduced order modeling.
We draw the conclusion that ML is poised to significantly transform CFD research by enhancing simulation accuracy, reducing computational time, and enabling more complex analyses of fluid dynamics.
arXiv Detail & Related papers (2024-08-22T07:33:11Z) - Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case
Study [31.243696199790413]
Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant.
The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals.
The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities.
arXiv Detail & Related papers (2024-01-04T08:53:08Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z) - ChatGPT & Mechanical Engineering: Examining performance on the FE
Mechanical Engineering and Undergraduate Exams [0.0]
This study examines the capabilities of ChatGPT within the discipline of mechanical engineering.
It aims to examine use cases and pitfalls of such a technology in the classroom and professional settings.
arXiv Detail & Related papers (2023-09-26T20:12:26Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - Performance of ChatGPT on the US Fundamentals of Engineering Exam:
Comprehensive Assessment of Proficiency and Potential Implications for
Professional Environmental Engineering Practice [0.0]
This study investigates the feasibility and effectiveness of using ChatGPT, a GPT-4 based model, in achieving satisfactory performance on the Fundamentals of Engineering (FE) Environmental Exam.
The findings reflect remarkable improvements in mathematical capabilities across successive iterations of ChatGPT models, showcasing their potential in solving complex engineering problems.
arXiv Detail & Related papers (2023-04-20T16:54:34Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - Summary of ChatGPT-Related Research and Perspective Towards the Future
of Large Language Models [40.557611946967086]
This paper presents a survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains.
We performed an in-depth analysis of 194 relevant papers on arXiv, encompassing trend analysis, word cloud representation, and distribution analysis across various application domains.
arXiv Detail & Related papers (2023-04-04T15:01:06Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance.
We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process.
Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.