Deep self-supervised learning with visualisation for automatic gesture recognition
- URL: http://arxiv.org/abs/2406.12440v1
- Date: Tue, 18 Jun 2024 09:44:55 GMT
- Title: Deep self-supervised learning with visualisation for automatic gesture recognition
- Authors: Fabien Allemand, Alessio Mazzela, Jun Villette, Decky Aspandi, Titus Zaharia,
- Abstract summary: Gesture is an important mean of non-verbal communication, with visual modality allows human to convey information during interaction, facilitating peoples and human-machine interactions.
In this work, we explore three different means to recognise hand signs using deep learning: supervised learning based methods, self-supervised methods and visualisation based techniques applied to 3D moving skeleton data.
- Score: 1.6647755388646919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gesture is an important mean of non-verbal communication, with visual modality allows human to convey information during interaction, facilitating peoples and human-machine interactions. However, it is considered difficult to automatically recognise gestures. In this work, we explore three different means to recognise hand signs using deep learning: supervised learning based methods, self-supervised methods and visualisation based techniques applied to 3D moving skeleton data. Self-supervised learning used to train fully connected, CNN and LSTM method. Then, reconstruction method is applied to unlabelled data in simulated settings using CNN as a backbone where we use the learnt features to perform the prediction in the remaining labelled data. Lastly, Grad-CAM is applied to discover the focus of the models. Our experiments results show that supervised learning method is capable to recognise gesture accurately, with self-supervised learning increasing the accuracy in simulated settings. Finally, Grad-CAM visualisation shows that indeed the models focus on relevant skeleton joints on the associated gesture.
Related papers
- Zero-Shot Underwater Gesture Recognition [3.4078654008228924]
Hand gesture recognition allows humans to interact with machines non-verbally, which has a huge application in underwater exploration using autonomous underwater vehicles.
Recently, a new gesture-based language called CADDIAN has been devised for divers, and supervised learning methods have been applied to recognize the gestures with high accuracy.
In this work, we advocate the need for zero-shot underwater gesture recognition (ZSUGR), where the objective is to train a model with visual samples of gestures from a few seen'' classes only and transfer the gained knowledge at test time to recognize semantically-similar unseen gesture classes as well.
arXiv Detail & Related papers (2024-07-19T08:16:46Z) - Enhancing 2D Representation Learning with a 3D Prior [21.523007105586217]
Learning robust and effective representations of visual data is a fundamental task in computer vision.
Traditionally, this is achieved by training models with labeled data which can be expensive to obtain.
We propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural.
arXiv Detail & Related papers (2024-06-04T17:55:22Z) - Towards Open-World Gesture Recognition [19.019579924491847]
In real-world applications involving gesture recognition, such as gesture recognition based on wrist-worn devices, the data distribution may change over time.
We propose the use of continual learning to enable machine learning models to be adaptive to new tasks.
We provide design guidelines to enhance the development of an open-world wrist-worn gesture recognition process.
arXiv Detail & Related papers (2024-01-20T06:45:16Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking [137.26381337333552]
In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data.
Since the object state is known in all synthesized data, existing deep trackers can be trained in routine ways without human annotation.
arXiv Detail & Related papers (2021-06-21T07:40:34Z) - Understanding the hand-gestures using Convolutional Neural Networks and
Generative Adversial Networks [0.0]
The system consists of three modules: real time hand tracking, training gesture and gesture recognition using Convolutional Neural Networks.
It has been tested to the vocabulary of 36 gestures including the alphabets and digits, and results effectiveness of the approach.
arXiv Detail & Related papers (2020-11-10T02:20:43Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Adversarial Self-Supervised Learning for Semi-Supervised 3D Action
Recognition [123.62183172631443]
We present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme.
Specifically, we design an effective SSL scheme to improve the discrimination capability of learned representations for 3D action recognition.
arXiv Detail & Related papers (2020-07-12T08:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.