Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement
- URL: http://arxiv.org/abs/2406.04942v1
- Date: Fri, 7 Jun 2024 13:53:02 GMT
- Title: Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement
- Authors: Wei Qian, Qi Li, Kun Li, Xinke Wang, Xiao Sun, Meng Wang, Dan Guo,
- Abstract summary: This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement.
The goal is to develop a self-supervised Physiological for heart rate (HR) using unlabeled facial videos.
Our solutions achieved a remarkable RMSE score of 8.85277 on the test dataset, securing bftext2nd place in Track 1 of the challenge.
- Score: 28.370473108391426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024. The goal is to develop a self-supervised learning algorithm for heart rate (HR) estimation using unlabeled facial videos. To tackle this task, we present two self-supervised HR estimation solutions that integrate spatial-temporal modeling and contrastive learning, respectively. Specifically, we first propose a non-end-to-end self-supervised HR measurement framework based on spatial-temporal modeling, which can effectively capture subtle rPPG clues and leverage the inherent bandwidth and periodicity characteristics of rPPG to constrain the model. Meanwhile, we employ an excellent end-to-end solution based on contrastive learning, aiming to generalize across different scenarios from complementary perspectives. Finally, we combine the strengths of the above solutions through an ensemble strategy to generate the final predictions, leading to a more accurate HR estimation. As a result, our solutions achieved a remarkable RMSE score of 8.85277 on the test dataset, securing \textbf{2nd place} in Track 1 of the challenge.
Related papers
- A Time-Aware Approach to Early Detection of Anorexia: UNSL at eRisk 2024 [0.9208007322096532]
The eRisk laboratory aims to address issues related to early risk detection on the Web.
Our research group solved Task 2 by defining a CPI+DMC approach, addressing both objectives independently, and a time-aware approach.
We achieved outstanding results for the ERDE50 metric and ranking-based metrics, demonstrating consistency in solving ERD problems.
arXiv Detail & Related papers (2024-10-23T15:30:37Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning [49.197385954021456]
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for visualization and subsequent analysis tasks.
To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.
Most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios.
arXiv Detail & Related papers (2024-06-10T02:20:26Z) - Supervised Contrastive Learning based Dual-Mixer Model for Remaining
Useful Life Prediction [3.081898819471624]
The Remaining Useful Life (RUL) prediction aims at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device.
To overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is proposed.
The effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset.
arXiv Detail & Related papers (2024-01-29T14:38:44Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Embedding Temporal Convolutional Networks for Energy-Efficient PPG-Based
Heart Rate Monitoring [17.155316991045765]
Photoplethysmography (volution) sensors allow for non-invasive and comfortable heart-rate (HR) monitoring.
Motion Artifacts (MAs) severely impact the monitoring accuracy, causing high variability in the skin-to-sensor interface.
We propose a computationally lightweight yet robust deep learning-based approach for PPG-based HR estimation.
We validate our approaches on two benchmark datasets, achieving as low as 3.84 Beats per Minute (BPM) of Mean Absolute Error (MAE) on PPGDalia.
arXiv Detail & Related papers (2022-03-01T17:04:28Z) - Self-Supervised Learning Framework for Remote Heart Rate Estimation
Using Spatiotemporal Augmentation [12.783744603679942]
Recent deep learning methods have shown that heart rate can be measured remotely using facial videos.
We present a 3D self-supervised learning framework for remote HR estimation on facial videos.
arXiv Detail & Related papers (2021-07-16T04:00:13Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - Semi-supervised Facial Action Unit Intensity Estimation with Contrastive
Learning [54.90704746573636]
Our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2%$ of annotated frames.
We experimentally validate that our method outperforms existing methods when working with as little as $2%$ of randomly chosen data.
arXiv Detail & Related papers (2020-11-03T17:35:57Z) - AutoHR: A Strong End-to-end Baseline for Remote Heart Rate Measurement
with Neural Searching [76.4844593082362]
We investigate the reason why existing end-to-end networks perform poorly in challenging conditions and establish a strong baseline for remote HR measurement with architecture search (NAS)
Comprehensive experiments are performed on three benchmark datasets on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2020-04-26T05:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.