Related papers: MMPD: Multi-Domain Mobile Video Physiology Dataset

MMPD: Multi-Domain Mobile Video Physiology Dataset

URL: http://arxiv.org/abs/2302.03840v2
Date: Mon, 1 May 2023 01:43:36 GMT
Title: MMPD: Multi-Domain Mobile Video Physiology Dataset
Authors: Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, Xin Liu
Abstract summary: The dataset is designed to capture videos with greater representation across skin tone, body motion, and lighting conditions. The reliability of the dataset is verified by mainstream unsupervised methods and neural methods.
Score: 23.810333638829302
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Remote photoplethysmography (rPPG) is an attractive method for noninvasive, convenient and concomitant measurement of physiological vital signals. Public benchmark datasets have served a valuable role in the development of this technology and improvements in accuracy over recent years.However, there remain gaps in the public datasets.First, despite the ubiquity of cameras on mobile devices, there are few datasets recorded specifically with mobile phone cameras. Second, most datasets are relatively small and therefore are limited in diversity, both in appearance (e.g., skin tone), behaviors (e.g., motion) and environment (e.g., lighting conditions). In an effort to help the field advance, we present the Multi-domain Mobile Video Physiology Dataset (MMPD), comprising 11 hours of recordings from mobile phones of 33 subjects. The dataset is designed to capture videos with greater representation across skin tone, body motion, and lighting conditions. MMPD is comprehensive with eight descriptive labels and can be used in conjunction with the rPPG-toolbox. The reliability of the dataset is verified by mainstream unsupervised methods and neural methods. The GitHub repository of our dataset: https://github.com/THU-CS-PI/MMPD_rPPG_dataset.

Related papers

Neural Field Representations of Mobile Computational Photography [4.459996749171579]
I show how carefully designed neural field models can compactly represent complex geometry and lighting effects.<n>I enable applications such as depth estimation, layer separation, and image stitching directly from collected in-the-wild mobile photography data.
arXiv Detail & Related papers (2025-08-08T00:03:46Z)
Exploring Remote Physiological Signal Measurement under Dynamic Lighting Conditions at Night: Dataset, Experiment, and Analysis [7.679574342546723]
We present and release a large-scale r dataset collected under dynamic lighting conditions at night, named DLCN.<n>The dataset comprises approximately 13 hours of video data and corresponding physiological signals from 98 participants, covering four representative nighttime lighting scenarios.<n>We provide a comprehensive analysis of the challenges faced by state-of-the-art r methods when applied to DLCN.
arXiv Detail & Related papers (2025-07-06T09:16:08Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals [20.113892246512776]
Video-based physiology extracts physiological signals by analyzing subtle changes in video recordings. There is currently no dataset specifically designed for passive home monitoring. The MHAD dataset comprises 1,440 videos from 40 subjects, capturing 6 typical activities from 3 angles in a real home environment.
arXiv Detail & Related papers (2024-09-14T08:42:39Z)
MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics [41.94295877935867]
We study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition. Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors.
arXiv Detail & Related papers (2024-07-22T14:24:56Z)
MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark [63.878793340338035]
Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. Existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting. We present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments.
arXiv Detail & Related papers (2024-03-29T15:08:37Z)
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks. Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data. In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z)
PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z)
Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN) CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z)
HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation [16.36110033895749]
We propose the dataset HabitatDyn, which contains both synthetic RGB videos, semantic labels, and depth information, as well as kinetics information. HabitatDyn was created from the perspective of a mobile robot with a moving camera, and contains 30 scenes featuring six different types of moving objects with varying velocities.
arXiv Detail & Related papers (2023-04-21T09:57:35Z)
SCAMPS: Synthetics for Camera Measurement of Physiological Signals [17.023803380199492]
We present SCAMPS, a dataset of synthetics containing 2,800 videos (1.68M frames) with aligned cardiac and respiratory signals and facial action intensities. We provide descriptive statistics about the underlying waveforms, including inter-beat interval, heart rate variability, and pulse arrival time.
arXiv Detail & Related papers (2022-06-08T23:48:41Z)
Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling [131.18390399368997]
We propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling. Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information. Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN)
arXiv Detail & Related papers (2021-10-29T08:25:44Z)
HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos. We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions. We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.