EHWGesture -- A dataset for multimodal understanding of clinical gestures
- URL: http://arxiv.org/abs/2509.07525v1
- Date: Tue, 09 Sep 2025 09:00:03 GMT
- Title: EHWGesture -- A dataset for multimodal understanding of clinical gestures
- Authors: Gianluca Amprimo, Alberto Ancilotto, Alessandro Savino, Fabio Quazzolo, Claudia Ferraris, Gabriella Olmo, Elisabetta Farella, Stefano Di Carlo,
- Abstract summary: EHWGesture is a multimodal video dataset for gesture understanding featuring clinically relevant gestures.<n>It includes over 1,100 recordings (6 hours), captured from 25 healthy subjects using two high-resolution RGB-Depth cameras and an event camera.
- Score: 40.20497299469609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hand gesture understanding is essential for several applications in human-computer interaction, including automatic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to complex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component embedded within gestures. This paper introduces EHWGesture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings (6 hours), captured from 25 healthy subjects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed an action quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evaluations of hand dexterity. Baseline experiments highlight the dataset's potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGesture can serve as a comprehensive benchmark for advancing multimodal clinical gesture understanding.
Related papers
- OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction [93.88239833545623]
We present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset.<n>We show that tactile signals provide a compact yet powerful cue for grasp understanding.<n>We aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.
arXiv Detail & Related papers (2025-12-18T18:18:17Z) - Multi-Modal Gesture Recognition from Video and Surgical Tool Pose Information via Motion Invariants [9.77463802740227]
Recognizing surgical gestures in real-time is a stepping stone towards automated activity recognition, skill assessment, intra-operative assistance, and eventually surgical automation.<n>While some recent works in multi-modal neural networks learn the relationships between vision and kinematics data, current approaches treat kinematics information as independent signals, with no underlying relation between tool-tip poses.<n>We show that gesture recognition improves when combining invariant signals with tool position, achieving 90.3% frame-wise accuracy on the JIGSAWS suturing dataset.
arXiv Detail & Related papers (2025-03-19T19:02:58Z) - Multi-view Video-Pose Pretraining for Operating Room Surgical Activity Recognition [5.787586057526269]
Surgical activity recognition is a key computer vision task that detects activities or phases from multi-view camera recordings.<n>Existing SAR models often fail to account for fine-grained clinician movements and multi-view knowledge.<n>We propose a novel calibration-free multi-view multi-modal pretraining framework called Multiview Pretraining for Video-Pose Surgical Activity Recognition PreViPS.
arXiv Detail & Related papers (2025-02-19T17:08:04Z) - Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves [9.838013581109681]
Real-time tracking of dexterous hand movements has numerous applications in human-computer interaction, metaverse, robotics, and tele-health.
Here, we report accurate and dynamic tracking of articulated hand and finger movements using stretchable, washable smart gloves with embedded helical sensor yarns and inertial measurement units.
The sensor yarns have a high dynamic range, responding to low 0.005 % to high 155 % strains, and show stability during extensive use and washing cycles.
arXiv Detail & Related papers (2024-10-03T05:32:16Z) - Learning Visuotactile Skills with Two Multifingered Hands [80.99370364907278]
We explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data.
Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data.
arXiv Detail & Related papers (2024-04-25T17:59:41Z) - Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition [45.0131792009999]
We propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition.
Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information.
Our network outperforms state-of-the-art approaches in most standard evaluation settings.
arXiv Detail & Related papers (2023-07-22T03:51:32Z) - Agile gesture recognition for capacitive sensing devices: adapting
on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller.
The controller generates real-time signals from each of the wearer five fingers.
We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z) - Reconfigurable Data Glove for Reconstructing Physical and Virtual Grasps [100.72245315180433]
We present a reconfigurable data glove design to capture different modes of human hand-object interactions.
The glove operates in three modes for various downstream tasks with distinct features.
We evaluate the system's three modes by (i) recording hand gestures and associated forces, (ii) improving manipulation fluency in VR, and (iii) producing realistic simulation effects of various tool uses.
arXiv Detail & Related papers (2023-01-14T05:35:50Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and
Progress Prediction [17.63619129438996]
We propose a multi-task recurrent neural network for simultaneous recognition of surgical gestures and estimation of a novel formulation of surgical task progress.
We demonstrate that recognition performance improves in multi-task frameworks with progress estimation without any additional manual labelling and training.
arXiv Detail & Related papers (2020-03-10T14:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.