RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement
- URL: http://arxiv.org/abs/2404.06483v2
- Date: Mon, 24 Feb 2025 04:52:32 GMT
- Title: RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement
- Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma,
- Abstract summary: Photoplesthysmography is a method for non- measurement of signals from physiological videos.<n>We introduce RhythmMamba, a state space model-based method that captures long-range transitions.<n>Experiments show that RhythmMamba achieves state-forward--the-forward performance with 319% and 23% peak GPU memory.
- Score: 10.132660483466239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG) is a method for non-contact measurement of physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: understanding the periodic pattern of rPPG among long contexts and addressing large spatiotemporal redundancy in video segments. These represent a trade-off between computational complexity and the ability to capture long-range dependencies. In this paper, we introduce RhythmMamba, a state space model-based method that captures long-range dependencies while maintaining linear complexity. By viewing rPPG as a time series task through the proposed frame stem, the periodic variations in pulse waves are modeled as state transitions. Additionally, we design multi-temporal constraint and frequency domain feed-forward, both aligned with the characteristics of rPPG time series, to improve the learning capacity of Mamba for rPPG signals. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with 319% throughput and 23% peak GPU memory. The codes are available at https://github.com/zizheng-guo/RhythmMamba.
Related papers
- Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement [0.0]
Remote photoplethys (rmography) is an emerging physiological sensing technique that leverages subtle color variations in facial videos to estimate vital signs such as heart rate and respiratory rate.<n>This non-invasive technique has gained traction across diverse domains, but its ability to capture fine-grained temporal dynamics under real-world conditions has been underexplored.<n>We propose Graph Reperio-r, a novel framework that strategically integrates a Transformer to effectively capture the periodic structure.
arXiv Detail & Related papers (2025-11-08T09:41:34Z) - TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation [51.56484100374058]
This paper introduces an innovative video understanding block (GVB) designed for efficient RGB videos.<n>Based on the Mam structure, this block integrates 2D-CNN and 3D-CNN to enhance video understanding for analysis.<n>Experiments show that our TYr can achieve state-of-the-art performance in commonly used datasets.
arXiv Detail & Related papers (2025-11-08T03:46:58Z) - Uniform Discrete Diffusion with Metric Path for Video Generation [103.86033350602908]
Continuous-space video generation has advanced rapidly, while discrete approaches lag behind due to error accumulation and long-duration inconsistency.<n>We present Uniform generative modeling and present Uniform pAth (URSA), a powerful framework that bridges the gap with continuous approaches for scalable video generation.<n>URSA consistently outperforms existing discrete methods and achieves performance comparable to state-of-the-art continuous diffusion methods.
arXiv Detail & Related papers (2025-10-28T17:59:57Z) - PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching [51.98089287914147]
textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.<n>Inspired by the two-stage decision-making process in humans, we propose a textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.
arXiv Detail & Related papers (2025-10-23T03:52:39Z) - BeatFormer: Efficient motion-robust remote heart rate estimation through unsupervised spectral zoomed attention filters [3.069335774032178]
Photoplethysmography (r) captures cardiac signals from zoomed videos and is gaining attention for its diverse applications.<n>While deep learning has advanced r estimation, it relies on large, diverse datasets for effective generalization.<n>We present BeatFormer, a lightweight spectral attention model for r estimation, which integrates orthonormal complex attention and frequency-domain energy measurement.<n>We also introduce Spectral Contrastive Learning (SCL), which allows BeatFormer to be trained without any PPG or HR labels.
arXiv Detail & Related papers (2025-07-20T10:00:31Z) - Adaptive Gate-Aware Mamba Networks for Magnetic Resonance Fingerprinting [8.673589204148115]
Magnetic Resonance Fingerprinting (MRF) enables fast quantitative imaging by matching signal evolutions to a predefined dictionary.<n>Recent work has explored deep learning-based approaches as alternatives to DM.<n>We propose GAST-Mamba, an end-to-end framework that combines a dual Mamba-based encoder with a Gate-Aware Spatial-Temporal (GAST) processor.
arXiv Detail & Related papers (2025-07-04T08:04:01Z) - Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation [6.32655874508904]
We propose a method that learns a general representation of periodic signals from unlabeled facial videos by capturing subtle changes in skin tone over time.<n>We evaluate the proposed method on the PURE, U-BFCr, MMPD, and V-BFC4V datasets.<n>Our results demonstrate significant performance improvements, particularly in challenging cross-dataset evaluations.
arXiv Detail & Related papers (2025-06-27T02:18:10Z) - DynSTG-Mamba: Dynamic Spatio-Temporal Graph Mamba with Cross-Graph Knowledge Distillation for Gait Disorders Recognition [1.7519167857253402]
DynTG-Mamba is a novel framework that combines DF-STGNN and STG-Mamba to enhance motion modeling.
DF-STGNN incorporates a dynamic spatial filter that adaptively adjusts between skeletal joints and temporal interactions.
STG-Mamba, an extension of Mamba, ensures a continuous propagation of states while reducing computational costs.
arXiv Detail & Related papers (2025-03-17T13:26:47Z) - PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba [20.435381963248787]
Previous deep learning based r measurement are primarily based on CNNs and Transformers.
We propose PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos.
Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba.
arXiv Detail & Related papers (2024-09-18T14:48:50Z) - PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks.
PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting [6.152779144421304]
We introduce a novel framework named FMamba for multivariate time-series forecasting (MTSF)
Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module.
We use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block)
Finally, FMamba obtains the predictive results through the projector, a linear layer.
arXiv Detail & Related papers (2024-07-20T09:14:05Z) - DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.
We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z) - TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification [13.110156202816112]
We propose a novel multi-view approach to capture patterns with properties like shift equivariance.
Our method integrates diverse features, including spectral, temporal, local, and global features, to obtain rich, complementary contexts for TSC.
Our approach achieves average accuracy improvements of 4.01-6.45% and 7.93% respectively, over leading TSC models.
arXiv Detail & Related papers (2024-06-06T18:05:10Z) - SpectralMamba: Efficient Mamba for Hyperspectral Image Classification [39.18999103115206]
Recurrent neural networks and Transformers have dominated most applications in hyperspectral (HS) imaging.
We propose SpectralMamba -- a novel state space model incorporated efficient deep learning framework for HS image classification.
We show that SpectralMamba surprisingly creates promising win-wins from both performance and efficiency perspectives.
arXiv Detail & Related papers (2024-04-12T14:12:03Z) - SPMamba: State-space model is all you need in speech separation [20.168153319805665]
CNN-based speech separation models face local receptive field limitations and cannot effectively capture long time dependencies.
We introduce an innovative speech separation method called SPMamba.
This model builds upon the robust TF-GridNet architecture, replacing its traditional BLSTM modules with bidirectional Mamba modules.
arXiv Detail & Related papers (2024-04-02T16:04:31Z) - RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal
Periodic Transformer [17.751885452773983]
We propose a fully end-to-end transformer-based method for extracting r signals by explicitly leveraging the quasi-periodic nature of r periodicity.
A fusion stem is proposed to guide self-attention to r features effectively, and it can be easily transferred to existing methods to enhance their performance significantly.
arXiv Detail & Related papers (2024-02-20T07:56:02Z) - Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks.
Our Vivim can effectively compress the long-term representation into sequences at varying scales.
Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z) - No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention
and Zoom-in Boundary Detection [52.03562682785128]
Temporal video grounding aims to retrieve the time interval of a language query from an untrimmed video.
A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR.
We propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection.
arXiv Detail & Related papers (2023-07-20T04:12:10Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Learning Multi-Granular Hypergraphs for Video-Based Person
Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision.
We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities.
90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Temporal Pyramid Network for Action Recognition [129.12076009042622]
We propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks.
TPN shows consistent improvements over other challenging baselines on several action recognition datasets.
arXiv Detail & Related papers (2020-04-07T17:17:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.