MaskFi: Unsupervised Learning of WiFi and Vision Representations for
Multimodal Human Activity Recognition
- URL: http://arxiv.org/abs/2402.19258v1
- Date: Thu, 29 Feb 2024 15:27:55 GMT
- Title: MaskFi: Unsupervised Learning of WiFi and Vision Representations for
Multimodal Human Activity Recognition
- Authors: Jianfei Yang, Shijie Tang, Yuecong Xu, Yunjiao Zhou, Lihua Xie
- Abstract summary: We propose a novel unsupervised multimodal HAR solution, MaskFi, that leverages only unlabeled video and WiFi activity data for model training.
Benefiting from our unsupervised learning procedure, the network requires only a small amount of annotated data for finetuning and can adapt to the new environment with better performance.
- Score: 32.89577715124546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activity recognition (HAR) has been playing an increasingly important
role in various domains such as healthcare, security monitoring, and metaverse
gaming. Though numerous HAR methods based on computer vision have been
developed to show prominent performance, they still suffer from poor robustness
in adverse visual conditions in particular low illumination, which motivates
WiFi-based HAR to serve as a good complementary modality. Existing solutions
using WiFi and vision modalities rely on massive labeled data that are very
cumbersome to collect. In this paper, we propose a novel unsupervised
multimodal HAR solution, MaskFi, that leverages only unlabeled video and WiFi
activity data for model training. We propose a new algorithm, masked
WiFi-vision modeling (MI2M), that enables the model to learn cross-modal and
single-modal features by predicting the masked sections in representation
learning. Benefiting from our unsupervised learning procedure, the network
requires only a small amount of annotated data for finetuning and can adapt to
the new environment with better performance. We conduct extensive experiments
on two WiFi-vision datasets collected in-house, and our method achieves human
activity recognition and human identification in terms of both robustness and
accuracy.
Related papers
- ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification [3.3743041904085125]
Person re-identification (ReID) plays a vital role in safety inspections, personnel counting, and more.
Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions.
We leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals.
arXiv Detail & Related papers (2024-10-13T15:34:11Z) - MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition [2.7532797256542403]
Human Activity Recognition (HAR) is a longstanding problem in AI with applications in a broad range of areas, including healthcare, sports and fitness, security, and more.
We introduce our comprehensive Fitness Multimodal Activity dataset (FiMAD) to enhance HAR performance across various modalities.
We show that classifiers pre-trained on FiMAD can increase the performance on real HAR datasets such as MM-Fit, MyoGym, MotionSense, and MHEALTH.
arXiv Detail & Related papers (2024-06-06T08:42:36Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - GaitFi: Robust Device-Free Human Identification via WiFi and Vision
Multimodal Learning [33.89340087471202]
We propose a novel multimodal gait recognition method, namely GaitFi, which leverages WiFi signals and videos for human identification.
In GaitFi, Channel State Information (CSI) that reflects the multi-path propagation of WiFi is collected to capture human gaits, while videos are captured by cameras.
To learn robust gait information, we propose a Lightweight Residual Convolution Network (LRCN) as the backbone network, and further propose the two-stream GaitFi.
Experiments are conducted in the real world, which demonstrates that the GaitFi outperforms state-of-the-art gait recognition
arXiv Detail & Related papers (2022-08-30T15:07:43Z) - WiFi-based Spatiotemporal Human Action Perception [53.41825941088989]
An end-to-end WiFi signal neural network (SNN) is proposed to enable WiFi-only sensing in both line-of-sight and non-line-of-sight scenarios.
Especially, the 3D convolution module is able to explore thetemporal continuity of WiFi signals, and the feature self-attention module can explicitly maintain dominant features.
arXiv Detail & Related papers (2022-06-20T16:03:45Z) - A Wireless-Vision Dataset for Privacy Preserving Human Activity
Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition.
Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z) - Unsupervised Person Re-Identification with Wireless Positioning under
Weak Scene Labeling [131.18390399368997]
We propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling.
Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information.
Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN)
arXiv Detail & Related papers (2021-10-29T08:25:44Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Self-Supervised WiFi-Based Activity Recognition [3.4473723375416188]
We extract fine-grained physical layer information from WiFi devices for passive activity recognition in indoor environments.
We propose the use of self-supervised contrastive learning to improve activity recognition performance.
We observe a 17.7% increase in macro averaged F1 score on the task of WiFi based activity recognition.
arXiv Detail & Related papers (2021-04-19T06:40:21Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.