InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios
- URL: http://arxiv.org/abs/2601.21173v1
- Date: Thu, 29 Jan 2026 02:18:24 GMT
- Title: InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios
- Authors: Zeyi Liu, Shuang Liu, Jihai Min, Zhaoheng Zhang, Jun Cen, Pengyu Han, Songqiao Hu, Zihan Meng, Xiao He, Donghua Zhou,
- Abstract summary: InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment.<n>The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites.<n> pixel-level segmentation annotations are provided for key objects in visible-spectrum images.<n>A semantic scene description and a corresponding safety level label are provided according to practical inspection tasks.
- Score: 13.487324283362566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of industrial intelligence and unmanned inspection, reliable perception and safety assessment for AI systems in complex and dynamic industrial sites has become a key bottleneck for deploying predictive maintenance and autonomous inspection. Most public datasets remain limited by simulated data sources, single-modality sensing, or the absence of fine-grained object-level annotations, which prevents robust scene understanding and multimodal safety reasoning for industrial foundation models. To address these limitations, InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment that is collected from routine operations of real inspection robots in real-world environments. InspecSafe-V1 covers five representative industrial scenarios, including tunnels, power facilities, sintering equipment, oil and gas petrochemical plants, and coal conveyor trestles. The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites, yielding 5,013 inspection instances. For each instance, pixel-level segmentation annotations are provided for key objects in visible-spectrum images. In addition, a semantic scene description and a corresponding safety level label are provided according to practical inspection tasks. Seven synchronized sensing modalities are further included, including infrared video, audio, depth point clouds, radar point clouds, gas measurements, temperature, and humidity, to support multimodal anomaly recognition, cross-modal fusion, and comprehensive safety assessment in industrial environments.
Related papers
- Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs [61.01470415470677]
Large vision-language models (LVLMs) exhibit remarkable capabilities in cross-modal tasks but face significant safety challenges.<n>Existing benchmarks are hindered by their labor-intensive construction process, static complexity, and limited discriminative power.<n>We propose VLSafetyBencher, the first automated system for LVLM safety benchmarking.
arXiv Detail & Related papers (2026-01-27T11:51:30Z) - Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation [5.0268543063681195]
We propose a pose-agnostic, zero-shot quality inspection framework that compares real scenes against real-time Digital Twins (DT) in the RGB-D space.<n>Our approach enables efficient real-time DT rendering by semantically describing industrial scenes through object detection and pose estimation.<n>Based on an automotive use case featuring the quality inspection of an axial flux motor, we demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2025-11-28T14:19:31Z) - IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation [56.43007596544299]
IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
arXiv Detail & Related papers (2025-11-21T16:48:49Z) - OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows [77.95511352806261]
Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms.<n>We propose OS-Sentinel, a novel hybrid safety detection framework that combines a Formal Verifier for detecting explicit system-level violations with a Contextual Judge for assessing contextual risks and agent actions.
arXiv Detail & Related papers (2025-10-28T13:22:39Z) - HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection [45.2338049870908]
Embodied agents can identify and report safety hazards in the home environments.<n>Existing benchmarks suffer from two key limitations.<n>HomeSafeBench is a benchmark with 12,900 data points covering five common home safety hazards.
arXiv Detail & Related papers (2025-09-28T07:01:27Z) - iSafetyBench: A video-language benchmark for safety in industrial environment [6.697702130929693]
iSafetyBench is a new video-language benchmark designed to evaluate model performance in industrial environments.<n>iSafetyBench comprises 1,100 video clips sourced from real-world industrial settings.<n>We evaluate eight state-of-the-art video-language models under zero-shot conditions.
arXiv Detail & Related papers (2025-08-01T07:55:53Z) - IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios [46.421243185923814]
Existing Embodied Question Answering (EQA) benchmarks primarily focus on household environments.<n>We introduce IndustryEQA, the first benchmark dedicated to evaluating embodied agent capabilities within safety-critical warehouse scenarios.<n>The benchmark includes rich annotations covering six categories: equipment safety, human safety, object recognition, attribute recognition, temporal understanding, and spatial understanding.
arXiv Detail & Related papers (2025-05-27T02:36:17Z) - Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task [2.0811729303868005]
We introduce Safe-Construct, a framework that reformulates violation recognition as a 3D multi-view engagement task.<n>Safe-Construct achieves a 7.6% improvement over state-of-the-art methods across four violation types.
arXiv Detail & Related papers (2025-04-15T05:21:09Z) - IPAD: Industrial Process Anomaly Detection Dataset [71.39058003212614]
Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames.
We propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios.
This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage.
arXiv Detail & Related papers (2024-04-23T13:38:01Z) - Toward an AI-enabled Connected Industry: AGV Communication and Sensor Measurement Datasets [33.89321466798318]
This paper presents two wireless measurement campaigns in industrial testbeds: industrial Vehicle-to-vehicle (iV2V) and industrial Vehicle-to-infrastructure plus Sensor (iV2I+)
iV2V covers sidelink communication scenarios between Automated Guided Vehicles (AGVs), while iV2I+ is conducted at an industrial setting where an autonomous cleaning robot is connected to a private cellular network.
The combination of different communication technologies within a common measurement methodology provides insights that can be exploited by Machine Learning (ML) for tasks such as fingerprinting, line-of-sight detection, prediction of quality of service or
arXiv Detail & Related papers (2022-12-20T15:04:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.