Self-Supervised Learning Framework for Remote Heart Rate Estimation
Using Spatiotemporal Augmentation
- URL: http://arxiv.org/abs/2107.07695v1
- Date: Fri, 16 Jul 2021 04:00:13 GMT
- Title: Self-Supervised Learning Framework for Remote Heart Rate Estimation
Using Spatiotemporal Augmentation
- Authors: Hao Wang, Euijoon Ahn, Jinman Kim
- Abstract summary: Recent deep learning methods have shown that heart rate can be measured remotely using facial videos.
We present a 3D self-supervised learning framework for remote HR estimation on facial videos.
- Score: 12.783744603679942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent supervised deep learning methods have shown that heart rate can be
measured remotely using facial videos. However, the performance of these
supervised method are dependent on the availability of large-scale labelled
data and they have been limited to 2D deep learning architectures that do not
fully exploit the 3D spatiotemporal information. To solve this problem, we
present a novel 3D self-supervised spatiotemporal learning framework for remote
HR estimation on facial videos. Concretely, we propose a landmark-based spatial
augmentation which splits the face into several informative parts based on the
Shafer's dichromatic reflection model and a novel sparsity-based temporal
augmentation exploiting Nyquist-Shannon sampling theorem to enhance the signal
modelling ability. We evaluated our method on 3 public datasets and
outperformed other self-supervised methods and achieved competitive accuracy
with the state-of-the-art supervised methods.
Related papers
- STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video [7.345621536750547]
This paper presents a graph-based framework for 3D human pose estimation in video.
Specifically, we develop a graph-based attention mechanism, integrating graph information directly into the respective attention layers.
We demonstrate that our method achieves significant stateof-the-art performance in 3D human pose estimation.
arXiv Detail & Related papers (2024-07-14T06:45:27Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Remote Heart Rate Monitoring in Smart Environments from Videos with
Self-supervised Pre-training [28.404118669462772]
We introduce a solution that utilizes self-supervised contrastive learning for the estimation of remote photoplethys (mography) and heart rate monitoring.
We propose the use of 3 spatial and 3 temporal augmentations for training an encoder through a contrastive framework, followed by utilizing the late-intermediate embeddings of the encoder for remote PPG and heart rate estimation.
arXiv Detail & Related papers (2023-10-23T22:41:04Z) - Promoting Generalization in Cross-Dataset Remote Photoplethysmography [1.422288795020666]
Remote Photoplethysmography, or the remote monitoring of a subject's heart rate using a camera, has seen a shift from handcrafted techniques to deep learning models.
We show that these models tend to learn a bias to pulse wave features inherent to the training dataset.
We develop augmentations to this learned bias by expanding both the range and variability of heart rates that the model sees while training, resulting in improved model convergence.
arXiv Detail & Related papers (2023-05-24T14:35:54Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Enhanced 3D Human Pose Estimation from Videos by using Attention-Based
Neural Network with Dilated Convolutions [12.900524511984798]
We show a systematic design for how conventional networks and other forms of constraints can be incorporated into the attention framework.
We achieve this by adapting temporal receptive field via a multi-scale structure of dilated convolutions.
Our method achieves the state-of-the-art performance and outperforms existing methods by reducing the mean per joint position error to 33.4 mm on Human3.6M dataset.
arXiv Detail & Related papers (2021-03-04T17:26:51Z) - Unsupervised Domain Adaptation with Temporal-Consistent Self-Training
for 3D Hand-Object Joint Reconstruction [131.34795312667026]
We introduce an effective approach to addressing this challenge by exploiting 3D geometric constraints within a cycle generative adversarial network (CycleGAN)
In contrast to most existing works, we propose to enforce short- and long-term temporal consistency to fine-tune the domain-adapted model in a self-supervised fashion.
We will demonstrate that our approach outperforms state-of-the-art 3D hand-object joint reconstruction methods on three widely-used benchmarks.
arXiv Detail & Related papers (2020-12-21T11:27:56Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Self-Supervised Human Depth Estimation from Monocular Videos [99.39414134919117]
Previous methods on estimating detailed human depth often require supervised training with ground truth' depth data.
This paper presents a self-supervised method that can be trained on YouTube videos without known depth.
Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
arXiv Detail & Related papers (2020-05-07T09:45:11Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.