Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task
- URL: http://arxiv.org/abs/2504.10880v1
- Date: Tue, 15 Apr 2025 05:21:09 GMT
- Title: Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task
- Authors: Aviral Chharia, Tianyu Ren, Tomotake Furuhata, Kenji Shimada,
- Abstract summary: We introduce Safe-Construct, a framework that reformulates violation recognition as a 3D multi-view engagement task.<n>Safe-Construct achieves a 7.6% improvement over state-of-the-art methods across four violation types.
- Score: 2.0811729303868005
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recognizing safety violations in construction environments is critical yet remains underexplored in computer vision. Existing models predominantly rely on 2D object detection, which fails to capture the complexities of real-world violations due to: (i) an oversimplified task formulation treating violation recognition merely as object detection, (ii) inadequate validation under realistic conditions, (iii) absence of standardized baselines, and (iv) limited scalability from the unavailability of synthetic dataset generators for diverse construction scenarios. To address these challenges, we introduce Safe-Construct, the first framework that reformulates violation recognition as a 3D multi-view engagement task, leveraging scene-level worker-object context and 3D spatial understanding. We also propose the Synthetic Indoor Construction Site Generator (SICSG) to create diverse, scalable training data, overcoming data limitations. Safe-Construct achieves a 7.6% improvement over state-of-the-art methods across four violation types. We rigorously evaluate our approach in near-realistic settings, incorporating four violations, four workers, 14 objects, and challenging conditions like occlusions (worker-object, worker-worker) and variable illumination (back-lighting, overexposure, sunlight). By integrating 3D multi-view spatial understanding and synthetic data generation, Safe-Construct sets a new benchmark for scalable and robust safety monitoring in high-risk industries. Project Website: https://Safe-Construct.github.io/Safe-Construct
Related papers
- Using Vision Language Models for Safety Hazard Identification in Construction [1.2343292905447238]
We propose and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards.<n>We evaluate state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images.
arXiv Detail & Related papers (2025-04-12T05:11:23Z) - Generalizing Safety Beyond Collision-Avoidance via Latent-Space Reachability Analysis [6.267574471145217]
Hamilton-Jacobi (H) reachability is a rigorous framework that enables robots to simultaneously detect unsafe states and generate actions.
We propose La Safety Filters, a latent-space reachability that operates directly on raw observation data.
arXiv Detail & Related papers (2025-02-02T22:00:20Z) - Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [88.18235230849554]
Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs.
We leverage safe embeddings and a modified diffusion process with weighted tunable summation in the latent space to generate safer images.
We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models.
arXiv Detail & Related papers (2024-11-21T09:47:13Z) - Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces [5.993182776695029]
We introduce Clip2Safety, an interpretable detection framework for diverse workplace safety compliance.
The framework comprises four main modules: scene recognition, the visual prompt, safety items detection, and fine-grained verification.
The results show that Clip2Safety not only demonstrates an accuracy improvement over state-of-the-art question-answering based VLMs but also achieves inference times two hundred times faster.
arXiv Detail & Related papers (2024-08-13T18:32:06Z) - Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment.
To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations.
Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding [55.32861154245772]
Calib3D is a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models.
We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets.
We introduce DeptS, a novel depth-aware scaling approach aimed at enhancing 3D model calibration.
arXiv Detail & Related papers (2024-03-25T17:59:59Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Robo3D: Towards Robust and Reliable 3D Perception against Corruptions [58.306694836881235]
We present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios.
We consider eight corruption types stemming from severe weather conditions, external disturbances, and internal sensor failure.
We propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency.
arXiv Detail & Related papers (2023-03-30T17:59:17Z) - USC: Uncompromising Spatial Constraints for Safety-Oriented 3D Object Detectors in Autonomous Driving [7.355977594790584]
We consider the safety-oriented performance of 3D object detectors in autonomous driving contexts.<n>We present uncompromising spatial constraints (USC), which characterize a simple yet important localization requirement.<n>We incorporate the quantitative measures into common loss functions to enable safety-oriented fine-tuning for existing models.
arXiv Detail & Related papers (2022-09-21T14:03:08Z) - Safety-Aware Hardening of 3D Object Detection Neural Network Systems [0.0]
We study how state-of-the-art neural networks for 3D object detection using a single-stage pipeline can be made safety aware.
The concept is detailed by extending the state-of-the-art PIXOR detector which creates object bounding boxes in bird's eye view with inputs from point clouds.
arXiv Detail & Related papers (2020-03-25T07:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.