Semantic Labeling of Human Action For Visually Impaired And Blind People
Scene Interaction
- URL: http://arxiv.org/abs/2201.04706v1
- Date: Wed, 12 Jan 2022 21:21:05 GMT
- Title: Semantic Labeling of Human Action For Visually Impaired And Blind People
Scene Interaction
- Authors: Leyla Benhamida, Slimane Larabi
- Abstract summary: The aim of this work is to contribute to the development of a tactile device for visually impaired and blind persons.
We use the skeleton information provided by Kinect, with the disentangled and unified multi-scale Graph Convolutional (MS-G3D) model to recognize the performed actions.
The recognized actions are labeled semantically and will be mapped into an output device perceivable by the touch sense.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this work is to contribute to the development of a tactile device
for visually impaired and blind persons in order to let them to understand
actions of the surrounding people and to interact with them. First, based on
the state-of-the-art methods of human action recognition from RGB-D sequences,
we use the skeleton information provided by Kinect, with the disentangled and
unified multi-scale Graph Convolutional (MS-G3D) model to recognize the
performed actions. We tested this model on real scenes and found some of
constraints and limitations. Next, we apply a fusion between skeleton modality
with MS-G3D and depth modality with CNN in order to bypass the discussed
limitations. Third, the recognized actions are labeled semantically and will be
mapped into an output device perceivable by the touch sense.
Related papers
- Deep self-supervised learning with visualisation for automatic gesture recognition [1.6647755388646919]
Gesture is an important mean of non-verbal communication, with visual modality allows human to convey information during interaction, facilitating peoples and human-machine interactions.
In this work, we explore three different means to recognise hand signs using deep learning: supervised learning based methods, self-supervised methods and visualisation based techniques applied to 3D moving skeleton data.
arXiv Detail & Related papers (2024-06-18T09:44:55Z) - SMART: Scene-motion-aware human action recognition framework for mental disorder group [16.60713558596286]
We propose to build a vision-based Human Action Recognition dataset including abnormal actions often occurring in the mental disorder group.
We then introduce a novel Scene-Motion-aware Action Recognition framework, named SMART, consisting of two technical modules.
The effectiveness of our proposed method has been validated on our self-collected HAR dataset (HAD), achieving 94.9% and 93.1% accuracy in un-seen subjects and scenes, and outperforming state-of-the-art approaches by 6.5% and 13.2%, respectively.
arXiv Detail & Related papers (2024-06-07T05:29:42Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - CaSAR: Contact-aware Skeletal Action Recognition [47.249908147135855]
We present a new framework called Contact-aware Skeletal Action Recognition (CaSAR)
CaSAR uses novel representations of hand-object interaction that encompass spatial information.
Our framework is able to learn how the hands touch or stay away from the objects for each frame of the action sequence, and use this information to predict the action class.
arXiv Detail & Related papers (2023-09-17T09:42:40Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
Understanding [67.21613160846299]
Embodied Reference Understanding (ERU) is first designed for this concern.
New dataset called ScanERU is constructed to evaluate the effectiveness of this idea.
arXiv Detail & Related papers (2023-03-23T11:36:14Z) - Human keypoint detection for close proximity human-robot interaction [29.99153271571971]
We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction.
The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection.
We propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection.
arXiv Detail & Related papers (2022-07-15T20:33:29Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.