Using Vision Language Models for Safety Hazard Identification in Construction
- URL: http://arxiv.org/abs/2504.09083v1
- Date: Sat, 12 Apr 2025 05:11:23 GMT
- Title: Using Vision Language Models for Safety Hazard Identification in Construction
- Authors: Muhammad Adil, Gaang Lee, Vicente A. Gonzalez, Qipei Mei,
- Abstract summary: We propose and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards.<n>We evaluate state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images.
- Score: 1.2343292905447238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety hazard identification and prevention are the key elements of proactive safety management. Previous research has extensively explored the applications of computer vision to automatically identify hazards from image clips collected from construction sites. However, these methods struggle to identify context-specific hazards, as they focus on detecting predefined individual entities without understanding their spatial relationships and interactions. Furthermore, their limited adaptability to varying construction site guidelines and conditions hinders their generalization across different projects. These limitations reduce their ability to assess hazards in complex construction environments and adaptability to unseen risks, leading to potential safety gaps. To address these challenges, we proposed and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards. The framework incorporates a prompt engineering module that structures safety guidelines into contextual queries, allowing VLM to process visual information and generate hazard assessments aligned with the regulation guide. Within this framework, we evaluated state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images. Experimental results show that GPT-4o and Gemini 1.5 Pro outperformed alternatives and displayed promising BERTScore of 0.906 and 0.888 respectively, highlighting their ability to identify both general and context-specific hazards. However, processing times remain a significant challenge, impacting real-time feasibility. These findings offer insights into the practical deployment of VLMs for construction site hazard detection, thereby contributing to the enhancement of proactive safety management.
Related papers
- Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task [2.0811729303868005]
We introduce Safe-Construct, a framework that reformulates violation recognition as a 3D multi-view engagement task.
Safe-Construct achieves a 7.6% improvement over state-of-the-art methods across four violation types.
arXiv Detail & Related papers (2025-04-15T05:21:09Z) - An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity.
We focus on technical approaches to misuse and misalignment.
We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z) - Graphormer-Guided Task Planning: Beyond Static Rules with LLM Safety Perception [4.424170214926035]
We propose a risk-aware task planning framework that combines large language models with structured safety modeling.<n>Our approach constructs a dynamic-semantic safety graph, capturing spatial and contextual risk factors.<n>Unlike existing methods that rely on predefined safety constraints, our framework introduces a context-aware risk perception module.
arXiv Detail & Related papers (2025-03-10T02:43:54Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.50078821423793]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [75.85283891591678]
Artificial Intelligence (AI) is revolutionizing scientific research, yet its growing integration into laboratory environments presents critical safety challenges.<n>Large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration.<n>Such overreliance is especially hazardous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents.<n>We propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive framework that evaluates LLMs and vision language models (VLMs) on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments.
arXiv Detail & Related papers (2024-10-18T05:21:05Z) - Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety [0.0]
This paper evaluates the use of vision-language models (VLMs) for zero-shot detection and association of hardhats to enhance construction safety.
We investigate the applicability of foundation models, specifically OWLv2, for detecting hardhats in real-world construction site images.
arXiv Detail & Related papers (2024-10-16T04:42:10Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces [5.993182776695029]
We introduce Clip2Safety, an interpretable detection framework for diverse workplace safety compliance.
The framework comprises four main modules: scene recognition, the visual prompt, safety items detection, and fine-grained verification.
The results show that Clip2Safety not only demonstrates an accuracy improvement over state-of-the-art question-answering based VLMs but also achieves inference times two hundred times faster.
arXiv Detail & Related papers (2024-08-13T18:32:06Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - The Need for Guardrails with Large Language Models in Medical Safety-Critical Settings: An Artificial Intelligence Application in the Pharmacovigilance Ecosystem [0.6965384453064829]
Large language models (LLMs) are useful tools with the capacity for performing specific types of knowledge work at an effective scale.
However, deployments in high-risk and safety-critical domains pose unique challenges, notably the issue of hallucinations.
This is particularly concerning in settings such as drug safety, where inaccuracies could lead to patient harm.
We have developed and demonstrated a proof of concept suite of guardrails specifically designed to mitigate certain types of hallucinations and errors for drug safety.
arXiv Detail & Related papers (2024-07-01T19:52:41Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - Foveate, Attribute, and Rationalize: Towards Physically Safe and
Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful.
We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety.
Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.