IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
- URL: http://arxiv.org/abs/2505.20640v1
- Date: Tue, 27 May 2025 02:36:17 GMT
- Title: IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
- Authors: Yifan Li, Yuhang Chen, Anh Dao, Lichi Li, Zhongyi Cai, Zhen Tan, Tianlong Chen, Yu Kong,
- Abstract summary: Existing Embodied Question Answering (EQA) benchmarks primarily focus on household environments.<n>We introduce IndustryEQA, the first benchmark dedicated to evaluating embodied agent capabilities within safety-critical warehouse scenarios.<n>The benchmark includes rich annotations covering six categories: equipment safety, human safety, object recognition, attribute recognition, temporal understanding, and spatial understanding.
- Score: 46.421243185923814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing Embodied Question Answering (EQA) benchmarks primarily focus on household environments, often overlooking safety-critical aspects and reasoning processes pertinent to industrial settings. This drawback limits the evaluation of agent readiness for real-world industrial applications. To bridge this, we introduce IndustryEQA, the first benchmark dedicated to evaluating embodied agent capabilities within safety-critical warehouse scenarios. Built upon the NVIDIA Isaac Sim platform, IndustryEQA provides high-fidelity episodic memory videos featuring diverse industrial assets, dynamic human agents, and carefully designed hazardous situations inspired by real-world safety guidelines. The benchmark includes rich annotations covering six categories: equipment safety, human safety, object recognition, attribute recognition, temporal understanding, and spatial understanding. Besides, it also provides extra reasoning evaluation based on these categories. Specifically, it comprises 971 question-answer pairs generated from small warehouse and 373 pairs from large ones, incorporating scenarios with and without human. We further propose a comprehensive evaluation framework, including various baseline models, to assess their general perception and reasoning abilities in industrial environments. IndustryEQA aims to steer EQA research towards developing more robust, safety-aware, and practically applicable embodied agents for complex industrial environments. Benchmark and codes are available.
Related papers
- InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios [13.487324283362566]
InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment.<n>The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites.<n> pixel-level segmentation annotations are provided for key objects in visible-spectrum images.<n>A semantic scene description and a corresponding safety level label are provided according to practical inspection tasks.
arXiv Detail & Related papers (2026-01-29T02:18:24Z) - IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation [56.43007596544299]
IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
arXiv Detail & Related papers (2025-11-21T16:48:49Z) - SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models [60.8821834954637]
We present SafeRBench, the first benchmark that assesses LRM safety end-to-end.<n>We pioneer the incorporation of risk categories and levels into input design.<n>We introduce a micro-thought chunking mechanism to segment long reasoning traces into semantically coherent units.
arXiv Detail & Related papers (2025-11-19T06:46:33Z) - Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents [63.03252293761656]
This paper systematically reviews the technologies, applications, and evaluation methods of industry agents based on large language models (LLMs)<n>We examine the three key technological pillars that support the advancement of agent capabilities: Memory, Planning, and Tool Use.<n>We provide an overview of the application of industry agents in real-world domains such as digital engineering, scientific discovery, embodied intelligence, collaborative business execution, and complex system simulation.
arXiv Detail & Related papers (2025-10-20T12:46:55Z) - Retrieval-Augmented Generation in Industry: An Interview Study on Use Cases, Requirements, Challenges, and Evaluation [0.0]
Retrieval-Augmented Generation (RAG) is a rapidly evolving field within AI.<n>There is a significant lack of research on its practical application in industrial contexts.<n>Our study investigates how companies apply RAG in practice.
arXiv Detail & Related papers (2025-08-11T09:40:54Z) - iSafetyBench: A video-language benchmark for safety in industrial environment [6.697702130929693]
iSafetyBench is a new video-language benchmark designed to evaluate model performance in industrial environments.<n>iSafetyBench comprises 1,100 video clips sourced from real-world industrial settings.<n>We evaluate eight state-of-the-art video-language models under zero-shot conditions.
arXiv Detail & Related papers (2025-08-01T07:55:53Z) - Reporte de vulnerabilidades en IIoT. Proyecto DEFENDER [0.0]
The main objective of this technical report is to conduct a comprehensive study on devices operating within Industrial Internet of Things (IIoT) environments.<n>The report seeks to identify and examine the main classes of IIoT devices, detailing their characteristics, functionalities, and roles within industrial systems.<n>It analyses the vulnerabilities affecting IIoT devices, outlining their vectors, targets, impact, and consequences.<n>The report presents a compilation of some of the most recent and effective security countermeasures as potential solutions to the security challenges faced by industrial systems.
arXiv Detail & Related papers (2025-07-14T21:37:02Z) - Understanding and Mitigating Risks of Generative AI in Financial Services [22.673239064487667]
We aim to highlight AI content safety considerations specific to the financial services domain and outline an associated AI content risk taxonomy.<n>We evaluate how existing open-source technical guardrail solutions cover this taxonomy by assessing them on data collected via red-teaming activities.
arXiv Detail & Related papers (2025-04-25T16:55:51Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [78.99703366417661]
Large language models (LLMs) increasingly assist in tasks ranging from procedural guidance to autonomous experiment orchestration.<n>Such overreliance is particularly dangerous in high-stakes laboratory settings, where failures in hazard identification or risk assessment can result in severe accidents.<n>We propose the Laboratory Safety Benchmark (LabSafety Bench) to evaluate models on their ability to identify potential hazards, assess risks, and predict the consequences of unsafe actions in lab environments.
arXiv Detail & Related papers (2024-10-18T05:21:05Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - IPAD: Industrial Process Anomaly Detection Dataset [71.39058003212614]
Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames.
We propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios.
This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage.
arXiv Detail & Related papers (2024-04-23T13:38:01Z) - Modeling and mitigation of occupational safety risks in dynamic
industrial environments [0.0]
This article proposes a method to enable continuous and quantitative assessment of safety risks in a data-driven manner.
A fully Bayesian approach is developed to calibrate this model from safety data in an online fashion.
The proposed model can be leveraged for automated decision making.
arXiv Detail & Related papers (2022-05-02T13:04:25Z) - Sustainability Through Cognition Aware Safety Systems -- Next Level
Human-Machine-Interaction [1.847374743273972]
Industrial Safety deals with the physical integrity of humans, machines and the environment when they interact during production scenarios.
The concept of a Cognition Aware Safety System (CASS) is to integrate AI based reasoning about human load, stress, and attention with AI based selection of actions to avoid the triggering of safety stops.
arXiv Detail & Related papers (2021-10-13T19:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.