Related papers: A Comparative Study of Human Activity Recognition: Motion, Tactile, and multi-modal Approaches

A Comparative Study of Human Activity Recognition: Motion, Tactile, and multi-modal Approaches

URL: http://arxiv.org/abs/2505.08657v1
Date: Tue, 13 May 2025 15:20:21 GMT
Title: A Comparative Study of Human Activity Recognition: Motion, Tactile, and multi-modal Approaches
Authors: Valerio Belcamino, Nhat Minh Dinh Le, Quan Khanh Luu, Alessandro Carfì, Van Anh Ho, Fulvio Mastrogiovanni,
Abstract summary: This study evaluates the ability of a vision-based tactile sensor to classify 15 activities.<n>We propose a multi-modal framework combining tactile and motion data to leverage their complementary strengths.
Score: 43.97520291340696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human activity recognition (HAR) is essential for effective Human-Robot Collaboration (HRC), enabling robots to interpret and respond to human actions. This study evaluates the ability of a vision-based tactile sensor to classify 15 activities, comparing its performance to an IMU-based data glove. Additionally, we propose a multi-modal framework combining tactile and motion data to leverage their complementary strengths. We examined three approaches: motion-based classification (MBC) using IMU data, tactile-based classification (TBC) with single or dual video streams, and multi-modal classification (MMC) integrating both. Offline validation on segmented datasets assessed each configuration's accuracy under controlled conditions, while online validation on continuous action sequences tested online performance. Results showed the multi-modal approach consistently outperformed single-modality methods, highlighting the potential of integrating tactile and motion sensing to enhance HAR systems for collaborative robotics.

Related papers

Confidence-driven Gradient Modulation for Multimodal Human Activity Recognition: A Dynamic Contrastive Dual-Path Learning Approach [3.0868241505670198]
We propose a novel framework called the Dynamic Contrastive Dual-Path Network (D-HAR)<n>The framework comprises three key components. First, a dual-path feature extraction architecture is employed, where ResNet and DenseCDPNet branches collaboratively process multimodal sensor data.<n>Second, a multi-stage contrastive learning mechanism is introduced to achieve progressive alignment from local perception to semantic abstraction.<n>Third, we present a confidence-driven gradient modulation strategy that dynamically monitors and adjusts the learning intensity of each modality branch during backpropagation.
arXiv Detail & Related papers (2025-07-03T17:37:46Z)
A Comprehensive Methodological Survey of Human Activity Recognition Across Divers Data Modalities [2.916558661202724]
Human Activity Recognition (HAR) systems aim to understand human behaviour and assign a label to each action. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2024.
arXiv Detail & Related papers (2024-09-15T10:04:44Z)
Unified Framework with Consistency across Modalities for Human Activity Recognition [14.639249548669756]
We propose a comprehensive framework for robust video-based human activity recognition. Key contribution is the introduction of a novel query machine, called COMPUTER. Our approach demonstrates superior performance when compared with state-of-the-art methods.
arXiv Detail & Related papers (2024-09-04T02:25:10Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal Human Activity Recognition [1.869225486385596]
We explore the hypothesis that leveraging multiple modalities can lead to better recognition. We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition. We propose a flexible, general-purpose framework for performing multimodal self-supervised learning.
arXiv Detail & Related papers (2022-05-20T10:39:16Z)
Attentive Cross-modal Connections for Deep Multimodal Wearable-based Emotion Recognition [7.559720049837459]
We present a novel attentive cross-modal connection to share information between convolutional neural networks. Specifically, these connections improve emotion classification by sharing intermediate representations among EDA and ECG. Our experiments show that the proposed approach is capable of learning strong multimodal representations and outperforms a number of baselines methods.
arXiv Detail & Related papers (2021-08-04T18:40:32Z)
Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm [5.276937617129594]
Human activity recognition (HAR) is a challenging task for robots due to difficulties related to multimodal data fusion. In this work, we introduce a neural network-based multimodal algorithm, HAMLET. We develop a novel multimodal attention mechanism for disentangling and fusing the salient unimodal features to compute the multimodal features in the upper layer.
arXiv Detail & Related papers (2020-08-03T19:34:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.