Head and eye egocentric gesture recognition for human-robot interaction
using eyewear cameras
- URL: http://arxiv.org/abs/2201.11500v1
- Date: Thu, 27 Jan 2022 13:26:05 GMT
- Title: Head and eye egocentric gesture recognition for human-robot interaction
using eyewear cameras
- Authors: Javier Marina-Miranda, V. Javier Traver
- Abstract summary: This work addresses the problem of human gesture recognition.
In particular, we focus on head and eye gestures, and adopt an egocentric (first-person) perspective using eyewear cameras.
A motion-based recognition approach is proposed, which operates at two temporal granularities.
- Score: 4.344337854565144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-verbal communication plays a particularly important role in a wide range
of scenarios in Human-Robot Interaction (HRI). Accordingly, this work addresses
the problem of human gesture recognition. In particular, we focus on head and
eye gestures, and adopt an egocentric (first-person) perspective using eyewear
cameras. We argue that this egocentric view offers a number of conceptual and
technical benefits over scene- or robot-centric perspectives.
A motion-based recognition approach is proposed, which operates at two
temporal granularities. Locally, frame-to-frame homographies are estimated with
a convolutional neural network (CNN). The output of this CNN is input to a long
short-term memory (LSTM) to capture longer-term temporal visual relationships,
which are relevant to characterize gestures.
Regarding the configuration of the network architecture, one particularly
interesting finding is that using the output of an internal layer of the
homography CNN increases the recognition rate with respect to using the
homography matrix itself. While this work focuses on action recognition, and no
robot or user study has been conducted yet, the system has been de signed to
meet real-time constraints. The encouraging results suggest that the proposed
egocentric perspective is viable, and this proof-of-concept work provides novel
and useful contributions to the exciting area of HRI.
Related papers
- Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition [53.359383163184425]
We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks.
This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network.
arXiv Detail & Related papers (2024-06-20T07:24:47Z) - Exploring Explainability in Video Action Recognition [5.7782784592048575]
Video Action Recognition and Image Classification are foundational tasks in computer vision.
Video-TCAV aims to quantify the importance of specific concepts in the decision-making process of Video Action Recognition models.
We propose a machine-assisted approach to generate spatial andtemporal concepts relevant to Video Action Recognition for testing Video-TCAV.
arXiv Detail & Related papers (2024-04-13T19:34:14Z) - Emotion Recognition from the perspective of Activity Recognition [0.0]
Appraising human emotional states, behaviors, and reactions displayed in real-world settings can be accomplished using latent continuous dimensions.
For emotion recognition systems to be deployed and integrated into real-world mobile and computing devices, we need to consider data collected in the world.
We propose a novel three-stream end-to-end deep learning regression pipeline with an attention mechanism.
arXiv Detail & Related papers (2024-03-24T18:53:57Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Context-Aware Sequence Alignment using 4D Skeletal Augmentation [67.05537307224525]
Temporal alignment of fine-grained human actions in videos is important for numerous applications in computer vision, robotics, and mixed reality.
We propose a novel context-aware self-supervised learning architecture to align sequences of actions.
Specifically, CASA employs self-attention and cross-attention mechanisms to incorporate the spatial and temporal context of human actions.
arXiv Detail & Related papers (2022-04-26T10:59:29Z) - Coarse Temporal Attention Network (CTA-Net) for Driver's Activity
Recognition [14.07119502083967]
Driver's activities are different since they are executed by the same subject with similar body parts movements, resulting in subtle changes.
Our model is named Coarse Temporal Attention Network (CTA-Net), in which coarse temporal branches are introduced in a trainable glimpse.
The model then uses an innovative attention mechanism to generate high-level action specific contextual information for activity recognition.
arXiv Detail & Related papers (2021-01-17T10:15:37Z) - Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture
Recognition [9.131161856493486]
We propose a novel end-to-end textbfRegional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN)
Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor.
The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.
arXiv Detail & Related papers (2021-01-17T10:14:28Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.