Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety
- URL: http://arxiv.org/abs/2410.12225v1
- Date: Wed, 16 Oct 2024 04:42:10 GMT
- Title: Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety
- Authors: Lucas Choi, Ross Greer,
- Abstract summary: This paper evaluates the use of vision-language models (VLMs) for zero-shot detection and association of hardhats to enhance construction safety.
We investigate the applicability of foundation models, specifically OWLv2, for detecting hardhats in real-world construction site images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper evaluates the use of vision-language models (VLMs) for zero-shot detection and association of hardhats to enhance construction safety. Given the significant risk of head injuries in construction, proper enforcement of hardhat use is critical. We investigate the applicability of foundation models, specifically OWLv2, for detecting hardhats in real-world construction site images. Our contributions include the creation of a new benchmark dataset, Hardhat Safety Detection Dataset, by filtering and combining existing datasets and the development of a cascaded detection approach. Experimental results on 5,210 images demonstrate that the OWLv2 model achieves an average precision of 0.6493 for hardhat detection. We further analyze the limitations and potential improvements for real-world applications, highlighting the strengths and weaknesses of current foundation models in safety perception domains.
Related papers
- Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.
Existing research predominantly concentrates on the security of general large language models.
This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - Using Vision Language Models for Safety Hazard Identification in Construction [1.2343292905447238]
We propose and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards.
We evaluate state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images.
arXiv Detail & Related papers (2025-04-12T05:11:23Z) - Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models.
We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness.
The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z) - Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [75.8161094916476]
We study how to develop a pretrained vision-language model (aka the CLIP model) for acquiring new capabilities or improving existing capabilities of image classification.
Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.
arXiv Detail & Related papers (2024-10-04T22:34:58Z) - The BRAVO Semantic Segmentation Challenge Results in UNCV2024 [68.20197719071436]
We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training.
The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.
arXiv Detail & Related papers (2024-09-23T15:17:30Z) - Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features.
This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks.
The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z) - Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts [6.486569431242123]
Vision-Language Models (VLMs) have recently achieved groundbreaking results.
Investigating OOD robustness in VLM object detection is essential to increase the trustworthiness of these models.
This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models.
arXiv Detail & Related papers (2024-04-01T14:18:15Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - On the Robustness of Object Detection Models in Aerial Images [37.50307094643692]
We introduce two novel benchmarks based on DOTA-v1.0.
The first benchmark encompasses 19 prevalent corruptions, while the second focuses on cloud-corrupted images.
We find that enhanced model architectures, larger networks, well-crafted modules, and judicious data augmentation strategies collectively enhance the robustness of aerial object detection models.
arXiv Detail & Related papers (2023-08-29T15:16:51Z) - Large, Complex, and Realistic Safety Clothing and Helmet Detection: Dataset and Method [35.39383667490014]
We construct a large, complex, and realistic safety clothing and helmet detection dataset.
SFCHD is derived from two authentic chemical plants, comprising 12,373 images, 7 categories, and 50,552 annotations.
We design a spatial and channel attention-based low-light enhancement module.
arXiv Detail & Related papers (2023-06-03T12:15:20Z) - A Survey on the Robustness of Computer Vision Models against Common Corruptions [3.6486148851646063]
Computer vision models are susceptible to changes in input images caused by sensor errors or extreme imaging environments.
These corruptions can significantly hinder the reliability of these models when deployed in real-world scenarios.
We present a comprehensive overview of methods that improve the robustness of computer vision models against common corruptions.
arXiv Detail & Related papers (2023-05-10T10:19:31Z) - Robo3D: Towards Robust and Reliable 3D Perception against Corruptions [58.306694836881235]
We present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios.
We consider eight corruption types stemming from severe weather conditions, external disturbances, and internal sensor failure.
We propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency.
arXiv Detail & Related papers (2023-03-30T17:59:17Z) - On the Robustness of Quality Measures for GANs [136.18799984346248]
This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr'echet Inception Distance (FID)
We show that such metrics can also be manipulated by additive pixel perturbations.
arXiv Detail & Related papers (2022-01-31T06:43:09Z) - A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications.
They are vulnerable to adversarial examples, which motivates the adversarial defense.
This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.