Related papers: "Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

URL: http://arxiv.org/abs/2404.08827v1
Date: Fri, 12 Apr 2024 21:56:21 GMT
Title: "Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations
Authors: James F. Mullen Jr, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Dinesh Manocha, Reza Ghanadan,
Abstract summary: We have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene.
Score: 49.66220439673356
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.

Related papers

Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety [0.0]
We propose a multimodal approach that integrates vision-language reasoning with zero-shot object detection. We refine object detection by incorporating OpenAI's CLIP model to match predicted hazards with bounding box annotations. Our findings highlight the strengths and limitations of current vision-language-based approaches.
arXiv Detail & Related papers (2025-04-18T01:25:02Z)
HomeEmergency -- Using Audio to Find and Respond to Emergencies in the Home [42.18870689560617]
In the United States alone accidental home deaths exceed 128,000 per year. Our work aims to enable home robots who respond to emergency scenarios in the home, preventing injuries and deaths.
arXiv Detail & Related papers (2025-04-01T18:07:25Z)
Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge [0.0]
This paper presents a novel approach for hazard analysis in dashcam footage. It addresses the detection of driver reactions to hazards, the identification of hazardous objects, and the generation of descriptive captions. Our method achieved the highest scores in the Challenge on Out-of-Label in Autonomous Driving.
arXiv Detail & Related papers (2025-01-27T13:32:01Z)
Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying [9.694804791791588]
We present a study on object detection for Detect and Avoid, a safety critical function for drones that detects air traffic during automated flights for safety reasons. Most models suffer from limited ground truth in raw data, eg recorded air traffic or frontal flight with a small aircraft. We overcome this problem by using inpainting methods to bootstrap the dataset such that it explicitly contains the corner cases of the raw data.
arXiv Detail & Related papers (2025-01-14T14:21:48Z)
Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies [26.79399508110069]
We argue that household robots should proactively detect such hazards or anomalies within the home. We leverage foundational models instead of relying on manually labeled data to build simulated environments. We demonstrate that our generated environment outperforms others in terms of task description and scene diversity.
arXiv Detail & Related papers (2024-10-16T19:29:14Z)
LLM-enhanced Scene Graph Learning for Household Rearrangement [28.375701371003107]
Household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. We propose to mine object functionality with user preference alignment directly from the scene itself. Our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.
arXiv Detail & Related papers (2024-08-22T03:03:04Z)
Semi-supervised Open-World Object Detection [74.95267079505145]
We introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD) We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-02-25T07:12:51Z)
SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs [81.15889805560333]
We present SG-Bot, a novel rearrangement framework. SG-Bot exemplifies lightweight, real-time, and user-controllable characteristics. Experimental results demonstrate that SG-Bot outperforms competitors by a large margin.
arXiv Detail & Related papers (2023-09-21T15:54:33Z)
On the Exploitability of Instruction Tuning [103.8077787502381]
In this work, we investigate how an adversary can exploit instruction tuning to change a model's behavior. We propose textitAutoPoison, an automated data poisoning pipeline. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data.
arXiv Detail & Related papers (2023-06-28T17:54:04Z)
Challenges in Visual Anomaly Detection for Mobile Robots [65.53820325712455]
We consider the task of detecting anomalies for autonomous mobile robots based on vision. We categorize relevant types of visual anomalies and discuss how they can be detected by unsupervised deep learning methods.
arXiv Detail & Related papers (2022-09-22T13:26:46Z)
Sensing Anomalies as Potential Hazards: Datasets and Benchmarks [43.55994393060723]
We consider the problem of detecting, in the visual sensing data stream of an autonomous mobile robot, semantic patterns that are unusual. We contribute three novel image-based datasets acquired in robot exploration scenarios. We study the performance of an anomaly detection approach based on autoencoders operating at different scales.
arXiv Detail & Related papers (2021-10-27T18:47:06Z)
Vision based Pedestrian Potential Risk Analysis based on Automated Behavior Feature Extraction for Smart and Safe City [5.759189800028578]
We propose a comprehensive analytical model for pedestrian potential risk using video footage gathered by road security cameras deployed at such crossings. The proposed system automatically detects vehicles and pedestrians, calculates trajectories by frames, and extracts behavioral features affecting the likelihood of potentially dangerous scenes between these objects. We validated feasibility and applicability by applying it in multiple crosswalks in Osan city, Korea.
arXiv Detail & Related papers (2021-05-06T11:03:10Z)
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models [27.100909068228813]
Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. In this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier.
arXiv Detail & Related papers (2021-03-29T12:19:45Z)
A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones. To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed. To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.