MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals
- URL: http://arxiv.org/abs/2409.09366v1
- Date: Sat, 14 Sep 2024 08:42:39 GMT
- Title: MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals
- Authors: Lei Yu, Jintao Fei, Xinyi Liu, Yang Yao, Jun Zhao, Guoxin Wang, Xin Li,
- Abstract summary: Video-based physiology extracts physiological signals by analyzing subtle changes in video recordings.
There is currently no dataset specifically designed for passive home monitoring.
The MHAD dataset comprises 1,440 videos from 40 subjects, capturing 6 typical activities from 3 angles in a real home environment.
- Score: 20.113892246512776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based physiology, exemplified by remote photoplethysmography (rPPG), extracts physiological signals such as pulse and respiration by analyzing subtle changes in video recordings. This non-contact, real-time monitoring method holds great potential for home settings. Despite the valuable contributions of public benchmark datasets to this technology, there is currently no dataset specifically designed for passive home monitoring. Existing datasets are often limited to close-up, static, frontal recordings and typically include only 1-2 physiological signals. To advance video-based physiology in real home settings, we introduce the MHAD dataset. It comprises 1,440 videos from 40 subjects, capturing 6 typical activities from 3 angles in a real home environment. Additionally, 5 physiological signals were recorded, making it a comprehensive video-based physiology dataset. MHAD is compatible with the rPPG-toolbox and has been validated using several unsupervised and supervised methods. Our dataset is publicly available at https://github.com/jdh-algo/MHAD-Dataset.
Related papers
- Editing Physiological Signals in Videos Using Latent Representations [1.1688456044134343]
Heart Rate (HR) is a non-contact means to monitor the health of an individual.<n>The presence of vital signals in facial videos raises significant privacy concerns.<n>We propose that edits physiological signals in videos while preserving visual fidelity.<n>Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or realistic videos with vital signs.
arXiv Detail & Related papers (2025-09-29T18:02:50Z) - Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation [36.002060195915526]
The paper introduces a novel large-scale multi-view video dataset for r and health estimation.<n>Our dataset comprises synchronized video recordings from 600 subjects, captured under varied conditions.<n>The public release of our dataset and model should significantly speed up the progress in the development of AI medical assistants.
arXiv Detail & Related papers (2025-08-25T11:46:40Z) - UL-DD: A Multimodal Drowsiness Dataset Using Video, Biometric Signals, and Behavioral Data [11.879350713051698]
This dataset includes 3D facial video using a depth camera, IR camera footage, posterior videos, and biometric signals such as heart rate, electrodermal activity, blood oxygen saturation, skin temperature, and accelerometer data.<n>Drowsiness levels were self-reported every four minutes using the Karolinska Sleepiness Scale (KSS)<n>This study aims to create a comprehensive multimodal dataset of driver drowsiness that captures a wider range of physiological, behavioral, and driving-related signals.
arXiv Detail & Related papers (2025-07-16T21:44:25Z) - emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography [47.160223334501126]
emg2qwerty is a large-scale dataset of non-invasive electromyographic signals recorded at the wrists while touch typing on a QWERTY keyboard.
With 1,135 sessions spanning 108 users and 346 hours of recording, this is the largest such public dataset to date.
We show strong baseline performance on predicting key-presses using sEMG signals alone.
arXiv Detail & Related papers (2024-10-26T05:18:48Z) - Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size.
Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM.
Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z) - Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems [49.11170948406405]
We propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos.
We take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems.
arXiv Detail & Related papers (2024-10-02T09:44:54Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - Multimodal video and IMU kinematic dataset on daily life activities
using affordable devices (VIDIMU) [0.0]
The objective of the dataset is to pave the way towards affordable patient gross motor tracking solutions for daily life activities recognition and kinematic analysis.
The novelty of dataset lies in: (i) the clinical relevance of the chosen movements, (ii) the combined utilization of affordable video and custom sensors, and (iii) the implementation of state-of-the-art tools for multimodal data processing of 3D body pose tracking and motion reconstruction.
arXiv Detail & Related papers (2023-03-27T14:05:49Z) - Motion Matters: Neural Motion Transfer for Better Camera Physiological
Measurement [25.27559386977351]
Body motion is one of the most significant sources of noise when attempting to recover the subtle cardiac pulse from a video.
We adapt a neural video synthesis approach to augment videos for the task of remote photoplethys.
We demonstrate a 47% improvement over existing inter-dataset results using various state-of-the-art methods.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - MMPD: Multi-Domain Mobile Video Physiology Dataset [23.810333638829302]
The dataset is designed to capture videos with greater representation across skin tone, body motion, and lighting conditions.
The reliability of the dataset is verified by mainstream unsupervised methods and neural methods.
arXiv Detail & Related papers (2023-02-08T02:20:01Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement
via Spatiotemporal Contrast [17.691683039742323]
Video-based remote physiological measurement face videos to measure the blood volume change signal, which is also called remote photoplethysmography (r)
We use a 3DCNN model to generate multiple rtemporal signals from each video in different locations and train the model with a contrastive loss where r signals from the same video are pulled together while those from different videos are pushed away.
arXiv Detail & Related papers (2022-08-08T19:30:57Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Video-based Remote Physiological Measurement via Cross-verified Feature
Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations.
We then use the distilled physiological features for robust multi-task physiological measurements.
The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.