Related papers: Automating construction safety inspections using a multi-modal vision-language RAG framework

Automating construction safety inspections using a multi-modal vision-language RAG framework

URL: http://arxiv.org/abs/2510.04145v1
Date: Sun, 05 Oct 2025 10:48:54 GMT
Title: Automating construction safety inspections using a multi-modal vision-language RAG framework
Authors: Chenxin Wang, Elyas Asadi Shamsabadi, Zhaohui Chen, Luming Shen, Alireza Ahmadian Fard Fini, Daniel Dias-da-Costa,
Abstract summary: This study introduces SiteShield, a framework for automating construction safety inspection reports by integrating visual and audio inputs.<n>Using real-world data, SiteShield outperformed unimodal LLMs with an F1 score of 0.82, hamming loss of 0.04, precision of 0.76, and recall of 0.96.
Score: 1.737994603273206
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Conventional construction safety inspection methods are often inefficient as they require navigating through large volume of information. Recent advances in large vision-language models (LVLMs) provide opportunities to automate safety inspections through enhanced visual and linguistic understanding. However, existing applications face limitations including irrelevant or unspecific responses, restricted modal inputs and hallucinations. Utilisation of Large Language Models (LLMs) for this purpose is constrained by availability of training data and frequently lack real-time adaptability. This study introduces SiteShield, a multi-modal LVLM-based Retrieval-Augmented Generation (RAG) framework for automating construction safety inspection reports by integrating visual and audio inputs. Using real-world data, SiteShield outperformed unimodal LLMs without RAG with an F1 score of 0.82, hamming loss of 0.04, precision of 0.76, and recall of 0.96. The findings indicate that SiteShield offers a novel pathway to enhance information retrieval and efficiency in generating safety reports.

Related papers

Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models [43.88239953205896]
We introduce Omni-SafetyBench, the first comprehensive parallel benchmark for OLLM safety evaluation.<n>Considering OLLMs' comprehension challenges with complex omni-modal inputs, we propose a Safety-score based on Conditional Attack Success Rate (C-ASR) and Refusal Rate (C-RR) to account for comprehension failures.<n>Using Omni-SafetyBench, we evaluated existing safety alignment algorithms and identified key challenges in OLLM safety alignment.
arXiv Detail & Related papers (2025-08-10T04:15:16Z)
Automating Steering for Safe Multimodal Large Language Models [58.36932318051907]
We introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model.<n>AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that automatically identifies the most safety-relevant distinctions among the model's internal layers; (2) an adaptive safety prober trained to estimate the likelihood of toxic outputs from intermediate representations; and (3) a lightweight Refusal Head that selectively intervenes to modulate generation when safety risks are detected.
arXiv Detail & Related papers (2025-07-17T16:04:55Z)
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? [83.53005932513155]
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited.<n>We propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences.
arXiv Detail & Related papers (2025-04-14T09:03:51Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences [18.36319991890607]
We present Wildflare GuardRail, a guardrail pipeline designed to enhance the safety and reliability of Large Language Model (LLM) inferences.<n>Wildflare GuardRail integrates several core functional modules, including Safety Detector that identifies unsafe inputs and detects hallucinations in model outputs.<n>The lightweight wrappers can address malicious URLs in model outputs in 1.06s per query with 100% accuracy without costly model calls.
arXiv Detail & Related papers (2025-02-12T05:48:57Z)
Deploying Privacy Guardrails for LLMs: A Comparative Analysis of Real-World Applications [3.1810537478232406]
OneShield is a framework designed to mitigate privacy risks in user inputs and LLM outputs across enterprise and open-source settings.<n>We analyze two real-world deployments, focusing on enterprise-scale data governance.<n>OneShield achieved a 0.95 F1 score in detecting sensitive entities across 26 languages, outperforming state-of-the-art tools.
arXiv Detail & Related papers (2025-01-21T19:04:53Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events [5.233512464561313]
Multimodal Large Language Models (MLLMs) offer a novel approach by integrating textual, visual, and audio modalities. Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts. Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis.
arXiv Detail & Related papers (2024-06-19T23:50:41Z)
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z)
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
We evaluate the effectiveness of 16 pre-trained Large Language Models on 5,000 code samples from five diverse security datasets. Overall, LLMs show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets. We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average)
arXiv Detail & Related papers (2023-11-16T13:17:20Z)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets. Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.