Related papers: TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation

TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation

URL: http://arxiv.org/abs/2511.05833v1
Date: Sat, 08 Nov 2025 03:46:58 GMT
Title: TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation
Authors: Taixi Chen, Yiu-ming Cheung,
Abstract summary: This paper introduces an innovative video understanding block (GVB) designed for efficient RGB videos.<n>Based on the Mam structure, this block integrates 2D-CNN and 3D-CNN to enhance video understanding for analysis.<n>Experiments show that our TYr can achieve state-of-the-art performance in commonly used datasets.
Score: 51.56484100374058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote photoplethysmography (rPPG) can remotely extract physiological signals from RGB video, which has many advantages in detecting heart rate, such as low cost and no invasion to patients. The existing rPPG model is usually based on the transformer module, which has low computation efficiency. Recently, the Mamba model has garnered increasing attention due to its efficient performance in natural language processing tasks, demonstrating potential as a substitute for transformer-based algorithms. However, the Mambaout model and its variants prove that the SSM module, which is the core component of the Mamba model, is unnecessary for the vision task. Therefore, we hope to prove the feasibility of using the Mambaout-based module to remotely learn the heart rate. Specifically, we propose a novel rPPG algorithm called uncomplicated and enhanced learning capability rPPG (TYrPPG). This paper introduces an innovative gated video understanding block (GVB) designed for efficient analysis of RGB videos. Based on the Mambaout structure, this block integrates 2D-CNN and 3D-CNN to enhance video understanding for analysis. In addition, we propose a comprehensive supervised loss function (CSL) to improve the model's learning capability, along with its weakly supervised variants. The experiments show that our TYrPPG can achieve state-of-the-art performance in commonly used datasets, indicating its prospects and superiority in remote heart rate estimation. The source code is available at https://github.com/Taixi-CHEN/TYrPPG.

Related papers

Trajectory-aware Shifted State Space Models for Online Video Super-Resolution [57.87099307245989]
This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba)<n>TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames.<n>Our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% reduction complexity (in MACs)
arXiv Detail & Related papers (2025-08-14T08:42:15Z)
Robust and Generalizable Heart Rate Estimation via Deep Learning for Remote Photoplethysmography in Complex Scenarios [7.2297623319815845]
Non-remote photoplethys technology enables heart rate measurement from facial videos.<n>Existing network models face challenges in accu racy, robustness, and generalization capability.<n>This paper proposes an end-to-end r extraction network that employs 3D convolutional neural networks.
arXiv Detail & Related papers (2025-07-10T14:23:11Z)
BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN)<n>We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations.<n>Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z)
VidFormer: A novel end-to-end framework fused by 3DCNN and Transformer for Video-based Remote Physiological Measurement [9.605944796068046]
We introduce VidFormer, a novel framework that integrates convolutional networks (CNNs) and models for r tasks.<n>Our evaluation on five publicly available datasets demonstrates that VidFormer outperforms current state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2025-01-03T08:18:08Z)
Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference [4.913049603343811]
Existing r measurement methods often overlook the incremental learning scenario. Most existing class incremental learning approaches are unsuitable for r measurement. We present a novel method named ADDP to tackle continual learning for r measurement.
arXiv Detail & Related papers (2024-07-19T01:49:09Z)
RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement [10.132660483466239]
Photoplesthysmography is a method for non- measurement of signals from physiological videos.<n>We introduce RhythmMamba, a state space model-based method that captures long-range transitions.<n>Experiments show that RhythmMamba achieves state-forward--the-forward performance with 319% and 23% peak GPU memory.
arXiv Detail & Related papers (2024-04-09T17:34:19Z)
Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation [2.4540404783565433]
Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. Vision transformers have been introduced to solve FCNN's locality of receptive fields. This paper proposes a technique to select the vision transformer's optimal input multi-resolution image patch size based on the average volume size of metastasis lesions.
arXiv Detail & Related papers (2023-08-31T09:57:27Z)
Progressive Fourier Neural Representation for Sequential Video Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.