Visual Detection of Diver Attentiveness for Underwater Human-Robot
Interaction
- URL: http://arxiv.org/abs/2209.14447v1
- Date: Wed, 28 Sep 2022 22:08:41 GMT
- Title: Visual Detection of Diver Attentiveness for Underwater Human-Robot
Interaction
- Authors: Sadman Sakib Enan and Junaed Sattar
- Abstract summary: We present a diver attention estimation framework for autonomous underwater vehicles (AUVs)
The core element of the framework is a deep neural network (called DATT-Net) which exploits the geometric relation among 10 facial keypoints of the divers to determine their head orientation.
Our experiments demonstrate that the proposed DATT-Net architecture can determine the attentiveness of human divers with promising accuracy.
- Score: 15.64806176508126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many underwater tasks, such as cable-and-wreckage inspection,
search-and-rescue, benefit from robust human-robot interaction (HRI)
capabilities. With the recent advancements in vision-based underwater HRI
methods, autonomous underwater vehicles (AUVs) can communicate with their human
partners even during a mission. However, these interactions usually require
active participation especially from humans (e.g., one must keep looking at the
robot during an interaction). Therefore, an AUV must know when to start
interacting with a human partner, i.e., if the human is paying attention to the
AUV or not. In this paper, we present a diver attention estimation framework
for AUVs to autonomously detect the attentiveness of a diver and then navigate
and reorient itself, if required, with respect to the diver to initiate an
interaction. The core element of the framework is a deep neural network (called
DATT-Net) which exploits the geometric relation among 10 facial keypoints of
the divers to determine their head orientation. Our on-the-bench experimental
evaluations (using unseen data) demonstrate that the proposed DATT-Net
architecture can determine the attentiveness of human divers with promising
accuracy. Our real-world experiments also confirm the efficacy of DATT-Net
which enables real-time inference and allows the AUV to position itself for an
AUV-diver interaction.
Related papers
- CoNav: A Benchmark for Human-Centered Collaborative Navigation [66.6268966718022]
We propose a collaborative navigation (CoNav) benchmark.
Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities.
We propose an intention-aware agent for reasoning both long-term and short-term human intention.
arXiv Detail & Related papers (2024-06-04T15:44:25Z) - Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition [13.956664101032006]
We first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories.
We then introduce the zero-shot interaction-oriented attention prediction task ZeroIA, which challenges models to predict visual cues for interactions not encountered during training.
Thirdly, we present the Interactive Attention model IA, designed to emulate human observers cognitive processes to tackle the ZeroIA problem.
arXiv Detail & Related papers (2024-05-16T09:34:57Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - FollowMe: a Robust Person Following Framework Based on Re-Identification
and Gestures [12.850149165791551]
Human-robot interaction (HRI) has become a crucial enabler in houses and industries for facilitating operational flexibility.
We developed a unified perception and navigation framework, which enables the robot to identify and follow a target person.
The Re-ID module can autonomously learn the features of a target person and use the acquired knowledge to visually re-identify the target.
arXiv Detail & Related papers (2023-11-21T20:59:27Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Robotic Detection of a Human-Comprehensible Gestural Language for
Underwater Multi-Human-Robot Collaboration [16.823029377470363]
We present a motion-based robotic communication framework that enables non-verbal communication among autonomous underwater vehicles (AUVs) and human divers.
We design a gestural language for AUV-to-A communication which can be easily understood by divers observing the conversation.
To allow As to visually understand a gesture from another AUV, we propose a deep network (RRCommNet) which exploits a self-attention mechanism to learn to recognize each message by extracting discrimi-temporal features.
arXiv Detail & Related papers (2022-07-12T06:04:12Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Visual Diver Face Recognition for Underwater Human-Robot Interaction [14.96844256049975]
The proposed method is able to recognize divers underwater with faces heavily obscured by scuba masks and breathing apparatus.
With the ability to correctly recognize divers, autonomous underwater vehicles (AUV) will be able to engage in collaborative tasks with the correct person.
arXiv Detail & Related papers (2020-11-18T21:57:09Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.