Related papers: Visual Detection of Diver Attentiveness for Underwater Human-Robot Interaction

Visual Detection of Diver Attentiveness for Underwater Human-Robot Interaction

URL: http://arxiv.org/abs/2209.14447v1
Date: Wed, 28 Sep 2022 22:08:41 GMT
Title: Visual Detection of Diver Attentiveness for Underwater Human-Robot Interaction
Authors: Sadman Sakib Enan and Junaed Sattar
Abstract summary: We present a diver attention estimation framework for autonomous underwater vehicles (AUVs) The core element of the framework is a deep neural network (called DATT-Net) which exploits the geometric relation among 10 facial keypoints of the divers to determine their head orientation. Our experiments demonstrate that the proposed DATT-Net architecture can determine the attentiveness of human divers with promising accuracy.
Score: 15.64806176508126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many underwater tasks, such as cable-and-wreckage inspection, search-and-rescue, benefit from robust human-robot interaction (HRI) capabilities. With the recent advancements in vision-based underwater HRI methods, autonomous underwater vehicles (AUVs) can communicate with their human partners even during a mission. However, these interactions usually require active participation especially from humans (e.g., one must keep looking at the robot during an interaction). Therefore, an AUV must know when to start interacting with a human partner, i.e., if the human is paying attention to the AUV or not. In this paper, we present a diver attention estimation framework for AUVs to autonomously detect the attentiveness of a diver and then navigate and reorient itself, if required, with respect to the diver to initiate an interaction. The core element of the framework is a deep neural network (called DATT-Net) which exploits the geometric relation among 10 facial keypoints of the divers to determine their head orientation. Our on-the-bench experimental evaluations (using unseen data) demonstrate that the proposed DATT-Net architecture can determine the attentiveness of human divers with promising accuracy. Our real-world experiments also confirm the efficacy of DATT-Net which enables real-time inference and allows the AUV to position itself for an AUV-diver interaction.

Related papers

Designing Control Barrier Function via Probabilistic Enumeration for Safe Reinforcement Learning Navigation [55.02966123945644]
We propose a hierarchical control framework leveraging neural network verification techniques to design control barrier functions (CBFs) and policy correction mechanisms. Our approach relies on probabilistic enumeration to identify unsafe regions of operation, which are then used to construct a safe CBF-based control layer. These experiments demonstrate the ability of the proposed solution to correct unsafe actions while preserving efficient navigation behavior.
arXiv Detail & Related papers (2025-04-30T13:47:25Z)
Sonar-based Deep Learning in Underwater Robotics: Overview, Robustness and Challenges [0.46873264197900916]
The predominant use of sonar in underwater environments, characterized by limited training data and inherent noise, poses challenges to model robustness. This paper studies sonar-based perception task models, such as classification, object detection, segmentation, and SLAM. It systematizes sonar-based state-of-the-art datasets, simulators, and robustness methods such as neural network verification, out-of-distribution, and adversarial attacks.
arXiv Detail & Related papers (2024-12-16T15:03:08Z)
CoNav: A Benchmark for Human-Centered Collaborative Navigation [66.6268966718022]
We propose a collaborative navigation (CoNav) benchmark. Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities. We propose an intention-aware agent for reasoning both long-term and short-term human intention.
arXiv Detail & Related papers (2024-06-04T15:44:25Z)
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques. Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z)
Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition [13.956664101032006]
We first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories. We then introduce the zero-shot interaction-oriented attention prediction task ZeroIA, which challenges models to predict visual cues for interactions not encountered during training. Thirdly, we present the Interactive Attention model IA, designed to emulate human observers cognitive processes to tackle the ZeroIA problem.
arXiv Detail & Related papers (2024-05-16T09:34:57Z)
Belief Aided Navigation using Bayesian Reinforcement Learning for Avoiding Humans in Blind Spots [0.0]
This study introduces a novel algorithm, BNBRL+, predicated on the partially observable Markov decision process framework to assess risks in unobservable areas. It integrates the dynamics between the robot, humans, and inferred beliefs to determine the navigation paths and embeds social norms within the reward function. The model's ability to navigate effectively in spaces with limited visibility and avoid obstacles dynamically can significantly improve the safety and reliability of autonomous vehicles.
arXiv Detail & Related papers (2024-03-15T08:50:39Z)
Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures. We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z)
Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
FollowMe: a Robust Person Following Framework Based on Re-Identification and Gestures [12.850149165791551]
Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility. We developed a unified perception and navigation framework, which enables the robot to identify and follow a target person. The Re-ID module can autonomously learn the features of a target person and use the acquired knowledge to visually re-identify the target.
arXiv Detail & Related papers (2023-11-21T20:59:27Z)
HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly. Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions. Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z)
H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations. We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework. We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Robotic Detection of a Human-Comprehensible Gestural Language for Underwater Multi-Human-Robot Collaboration [16.823029377470363]
We present a motion-based robotic communication framework that enables non-verbal communication among autonomous underwater vehicles (AUVs) and human divers. We design a gestural language for AUV-to-A communication which can be easily understood by divers observing the conversation. To allow As to visually understand a gesture from another AUV, we propose a deep network (RRCommNet) which exploits a self-attention mechanism to learn to recognize each message by extracting discrimi-temporal features.
arXiv Detail & Related papers (2022-07-12T06:04:12Z)
Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots. We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Visual Diver Face Recognition for Underwater Human-Robot Interaction [14.96844256049975]
The proposed method is able to recognize divers underwater with faces heavily obscured by scuba masks and breathing apparatus. With the ability to correctly recognize divers, autonomous underwater vehicles (AUV) will be able to engage in collaborative tasks with the correct person.
arXiv Detail & Related papers (2020-11-18T21:57:09Z)
DARE: AI-based Diver Action Recognition System using Multi-Channel CNNs for AUV Supervision [3.5584173777587935]
This paper presents DARE, a diver action recognition system that is trained based on Cognitive Autonomous Driving Buddy dataset. DARE is fast and requires only a few milliseconds to classify one stereo-pair, thus making it suitable for real-time underwater implementation.
arXiv Detail & Related papers (2020-11-16T04:05:32Z)
Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.