Multi-modal Machine Learning in Engineering Design: A Review and Future
Directions
- URL: http://arxiv.org/abs/2302.10909v2
- Date: Fri, 28 Jul 2023 15:52:27 GMT
- Title: Multi-modal Machine Learning in Engineering Design: A Review and Future
Directions
- Authors: Binyang Song, Rui Zhou, Faez Ahmed
- Abstract summary: This paper presents a comprehensive overview of the current state, advancements, and challenges of multi-modal machine learning (MMML)
We highlight the inherent challenges in adopting MMML in engineering design, and proffer potential directions for future research.
MMML models, as the next generation of intelligent design tools, hold a promising future to impact how products are designed.
- Score: 9.213020570527451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the rapidly advancing field of multi-modal machine learning (MMML), the
convergence of multiple data modalities has the potential to reshape various
applications. This paper presents a comprehensive overview of the current
state, advancements, and challenges of MMML within the sphere of engineering
design. The review begins with a deep dive into five fundamental concepts of
MMML:multi-modal information representation, fusion, alignment, translation,
and co-learning. Following this, we explore the cutting-edge applications of
MMML, placing a particular emphasis on tasks pertinent to engineering design,
such as cross-modal synthesis, multi-modal prediction, and cross-modal
information retrieval. Through this comprehensive overview, we highlight the
inherent challenges in adopting MMML in engineering design, and proffer
potential directions for future research. To spur on the continued evolution of
MMML in engineering design, we advocate for concentrated efforts to construct
extensive multi-modal design datasets, develop effective data-driven MMML
techniques tailored to design applications, and enhance the scalability and
interpretability of MMML models. MMML models, as the next generation of
intelligent design tools, hold a promising future to impact how products are
designed.
Related papers
- NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints [100.02131897927484]
This paper focuses on the native training of Multimodal Large Language Models (MLLMs) in an end-to-end manner.<n>We propose a native MLLM called NaViL, combined with a simple and cost-effective recipe.<n> Experimental results on 14 multimodal benchmarks confirm the competitive performance of NaViL against existing MLLMs.
arXiv Detail & Related papers (2025-10-09T17:59:37Z) - Developing a Multi-Modal Machine Learning Model For Predicting Performance of Automotive Hood Frames [0.0]
This paper develops a multimodal machine-learning architecture that learns from different modalities of the same data to predict performance metrics.<n>It also aims to use the MMML architecture to enhance the efficiency of engineering design processes by reducing reliance on computationally expensive simulations.
arXiv Detail & Related papers (2025-08-28T02:15:54Z) - Empowering Multimodal LLMs with External Tools: A Comprehensive Survey [61.66069828956139]
Multimodal Large Language Models (MLLMs) have achieved great success in various multimodal tasks, pointing toward a promising pathway to artificial general intelligence.<n>Lack of multimodal data, poor performance on many complex downstream tasks, and inadequate evaluation protocols hinder the reliability and broader applicability of MLLMs.<n>Inspired by the human ability to leverage external tools for enhanced reasoning and problem-solving, augmenting MLLMs with external tools offers a promising strategy to overcome these challenges.
arXiv Detail & Related papers (2025-08-14T07:25:45Z) - Multilingual Multimodal Software Developer for Code Generation [35.33149292210637]
We introduce MM-Coder, a Multilingual Multimodal software developer.<n> MM-Coder integrates visual design inputs-Unified Language (UML) diagrams and flowcharts.<n>MMEval is a new benchmark for evaluating multimodal code generation.
arXiv Detail & Related papers (2025-07-11T16:19:53Z) - Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study [3.6738896410816007]
Multimodal summarization integrating information from diverse data modalities presents a promising solution to aid the understanding of information within various processes.
The application and advantages of multimodal summarization have not received much attention in model-based engineering (MBE), where it has become a cornerstone in the design and development of complex systems.
arXiv Detail & Related papers (2025-03-06T14:53:37Z) - SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation.
We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding.
Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z) - VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents [50.12414817737912]
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents.
Existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments.
VisualAgentBench (VAB) is a pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents.
arXiv Detail & Related papers (2024-08-12T17:44:17Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - From Efficient Multimodal Models to World Models: A Survey [28.780451336834876]
Multimodal Large Models (MLMs) are becoming a significant research focus combining powerful language models with multimodal learning.
This review explores the latest developments and challenges in large instructions, emphasizing their potential in achieving artificial general intelligence.
arXiv Detail & Related papers (2024-06-27T15:36:43Z) - A Review of Multi-Modal Large Language and Vision Models [1.9685736810241874]
Large Language Models (LLMs) have emerged as a focal point of research and application.
Recently, LLMs have been extended into multi-modal large language models (MM-LLMs)
This paper provides an extensive review of the current state of those LLMs with multi-modal capabilities as well as the very recent MM-LLMs.
arXiv Detail & Related papers (2024-03-28T15:53:45Z) - Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - MM-LLMs: Recent Advances in MultiModal Large Language Models [49.06046606933233]
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements.
We introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations.
We review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs.
arXiv Detail & Related papers (2024-01-24T17:10:45Z) - MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples [63.78384552789171]
This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel multi-modal fine-tuning paradigm.
We propose the Multi-Modal Hub (M-Hub), a unified module that captures various multi-modal features according to different inputs and objectives.
Based on M-Hub, MMICT enables MM-LLMs to learn from in-context visual-guided textual features and subsequently generate outputs conditioned on the textual-guided visual features.
arXiv Detail & Related papers (2023-12-11T13:11:04Z) - A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot.
This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.