SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification
- URL: http://arxiv.org/abs/2504.11477v1
- Date: Sat, 12 Apr 2025 11:37:10 GMT
- Title: SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification
- Authors: Yunkai Zhang, Shiyin Wei, Yong Huang, Yawu Su, Shanshan Lu, Hui Li,
- Abstract summary: This study introduces SDIGLM, an innovative multi-modal structural damage identification model.<n>By leveraging this multi-modal CoT, SDIGLM surpasses general-purpose LMMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types.
- Score: 2.9239817922453333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing computer vision(CV)-based structural damage identification models demonstrate notable accuracy in categorizing and localizing damage. However, these models present several critical limitations that hinder their practical application in civil engineering(CE). Primarily, their ability to recognize damage types remains constrained, preventing comprehensive analysis of the highly varied and complex conditions encountered in real-world CE structures. Second, these models lack linguistic capabilities, rendering them unable to articulate structural damage characteristics through natural language descriptions. With the continuous advancement of artificial intelligence(AI), large multi-modal models(LMMs) have emerged as a transformative solution, enabling the unified encoding and alignment of textual and visual data. These models can autonomously generate detailed descriptive narratives of structural damage while demonstrating robust generalization across diverse scenarios and tasks. This study introduces SDIGLM, an innovative LMM for structural damage identification, developed based on the open-source VisualGLM-6B architecture. To address the challenge of adapting LMMs to the intricate and varied operating conditions in CE, this work integrates a U-Net-based semantic segmentation module to generate defect segmentation maps as visual Chain of Thought(CoT). Additionally, a multi-round dialogue fine-tuning dataset is constructed to enhance logical reasoning, complemented by a language CoT formed through prompt engineering. By leveraging this multi-modal CoT, SDIGLM surpasses general-purpose LMMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types. Moreover, the model effectively describes damage characteristics such as hole size, crack direction, and corrosion severity.
Related papers
- When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions [23.5858385520752]
Large Language Models (LLMs) have demonstrated impressive performance in code generation tasks under idealized conditions.<n>In practice, task descriptions frequently exhibit ambiguity, incompleteness, or internal contradictions.<n>We present the first empirical study examining the robustness of state-of-the-art code generation models when faced with such unclear task descriptions.
arXiv Detail & Related papers (2025-07-27T23:16:14Z) - Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations [50.45261187796993]
Graph Neural Networks (GNNs) fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs) exhibit a surprising ability in structure-aware tasks.<n>This paper introduces a comprehensive probing framework from an information-theoretic perspective.
arXiv Detail & Related papers (2025-06-26T18:10:28Z) - Adversarial Attack Classification and Robustness Testing for Large Language Models for Code [19.47426054151291]
This study investigates how adversarial perturbations in natural language inputs affect Large Language Models for Code (LLM4Code)<n>It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities.
arXiv Detail & Related papers (2025-06-09T17:02:29Z) - Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z) - Elucidating the Design Space of Multimodal Protein Language Models [69.3650883370033]
Multimodal protein language models (PLMs) integrate sequence and token-based structural information.<n>This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations.<n>Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
arXiv Detail & Related papers (2025-04-15T17:59:43Z) - Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - VACT: A Video Automatic Causal Testing System and a Benchmark [55.53300306960048]
VACT is an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios.<n>We introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs.
arXiv Detail & Related papers (2025-03-08T10:54:42Z) - Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? [5.979778557940213]
Traditional industrial anomaly detection models often struggle with flexibility and adaptability.<n>Recent advancements in Multimodal Large Language Models (MLLMs) hold promise for overcoming these limitations.<n>We propose Echo, a novel multi-expert framework designed to enhance MLLM performance for IAD.
arXiv Detail & Related papers (2025-01-27T05:41:10Z) - Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z) - M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization.
Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages.
This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z) - Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks.
We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.