PhySU-Net: Long Temporal Context Transformer for rPPG with
Self-Supervised Pre-training
- URL: http://arxiv.org/abs/2402.11913v1
- Date: Mon, 19 Feb 2024 07:59:16 GMT
- Title: PhySU-Net: Long Temporal Context Transformer for rPPG with
Self-Supervised Pre-training
- Authors: Marko Savic, Guoying Zhao
- Abstract summary: We propose Phy-Net, the first long-temporal spatial map r transformer network and a self-supervised pre-training strategy.
Our model is tested on two public datasets (OBF and VIPL-HR) and shows superior performance in supervised training.
- Score: 21.521146237660766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG) is a promising technology that consists of
contactless measuring of cardiac activity from facial videos. Most recent
approaches utilize convolutional networks with limited temporal modeling
capability or ignore long temporal context. Supervised rPPG methods are also
severely limited by scarce data availability. In this work, we propose
PhySU-Net, the first long spatial-temporal map rPPG transformer network and a
self-supervised pre-training strategy that exploits unlabeled data to improve
our model. Our strategy leverages traditional methods and image masking to
provide pseudo-labels for self-supervised pre-training. Our model is tested on
two public datasets (OBF and VIPL-HR) and shows superior performance in
supervised training. Furthermore, we demonstrate that our self-supervised
pre-training strategy further improves our model's performance by leveraging
representations learned from unlabeled data.
Related papers
- Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference [4.913049603343811]
Existing r measurement methods often overlook the incremental learning scenario.
Most existing class incremental learning approaches are unsuitable for r measurement.
We present a novel method named ADDP to tackle continual learning for r measurement.
arXiv Detail & Related papers (2024-07-19T01:49:09Z) - GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks [24.323017830938394]
This work aims to address challenges by introducing a pre-training framework that seamlessly integrates with baselines and enhances their performance.
The framework is built upon two key designs: (i) We propose a.
apple-to-apple mask autoencoder as a pre-training model for learning-temporal dependencies.
These modules are specifically designed to capture intra-temporal customized representations and semantic- and inter-cluster relationships.
arXiv Detail & Related papers (2023-11-07T02:36:24Z) - Temporal DINO: A Self-supervised Video Strategy to Enhance Action
Prediction [15.696593695918844]
This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels)
The experimental results showcase significant improvements in prediction performance across 3D-ResNet, Transformer, and LSTM architectures.
These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.
arXiv Detail & Related papers (2023-08-08T21:18:23Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - Semi-supervised Facial Action Unit Intensity Estimation with Contrastive
Learning [54.90704746573636]
Our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2%$ of annotated frames.
We experimentally validate that our method outperforms existing methods when working with as little as $2%$ of randomly chosen data.
arXiv Detail & Related papers (2020-11-03T17:35:57Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.