"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations
- URL: http://arxiv.org/abs/2404.08827v1
- Date: Fri, 12 Apr 2024 21:56:21 GMT
- Title: "Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations
- Authors: James F. Mullen Jr, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Dinesh Manocha, Reza Ghanadan,
- Abstract summary: We have created a new dataset, which we call SafetyDetect.
The SafetyDetect dataset consists of 1000 anomalous home scenes.
Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene.
- Score: 49.66220439673356
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.
Related papers
- LLM-enhanced Scene Graph Learning for Household Rearrangement [28.375701371003107]
Household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places.
We propose to mine object functionality with user preference alignment directly from the scene itself.
Our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.
arXiv Detail & Related papers (2024-08-22T03:03:04Z) - Semi-supervised Open-World Object Detection [74.95267079505145]
We introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD)
We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting.
Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-02-25T07:12:51Z) - SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs [81.15889805560333]
We present SG-Bot, a novel rearrangement framework.
SG-Bot exemplifies lightweight, real-time, and user-controllable characteristics.
Experimental results demonstrate that SG-Bot outperforms competitors by a large margin.
arXiv Detail & Related papers (2023-09-21T15:54:33Z) - On the Exploitability of Instruction Tuning [103.8077787502381]
In this work, we investigate how an adversary can exploit instruction tuning to change a model's behavior.
We propose textitAutoPoison, an automated data poisoning pipeline.
Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data.
arXiv Detail & Related papers (2023-06-28T17:54:04Z) - Challenges in Visual Anomaly Detection for Mobile Robots [65.53820325712455]
We consider the task of detecting anomalies for autonomous mobile robots based on vision.
We categorize relevant types of visual anomalies and discuss how they can be detected by unsupervised deep learning methods.
arXiv Detail & Related papers (2022-09-22T13:26:46Z) - Sensing Anomalies as Potential Hazards: Datasets and Benchmarks [43.55994393060723]
We consider the problem of detecting, in the visual sensing data stream of an autonomous mobile robot, semantic patterns that are unusual.
We contribute three novel image-based datasets acquired in robot exploration scenarios.
We study the performance of an anomaly detection approach based on autoencoders operating at different scales.
arXiv Detail & Related papers (2021-10-27T18:47:06Z) - Vision based Pedestrian Potential Risk Analysis based on Automated
Behavior Feature Extraction for Smart and Safe City [5.759189800028578]
We propose a comprehensive analytical model for pedestrian potential risk using video footage gathered by road security cameras deployed at such crossings.
The proposed system automatically detects vehicles and pedestrians, calculates trajectories by frames, and extracts behavioral features affecting the likelihood of potentially dangerous scenes between these objects.
We validated feasibility and applicability by applying it in multiple crosswalks in Osan city, Korea.
arXiv Detail & Related papers (2021-05-06T11:03:10Z) - Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability
of the Embedding Layers in NLP Models [27.100909068228813]
Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack.
In this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector.
Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier.
arXiv Detail & Related papers (2021-03-29T12:19:45Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.