RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos
- URL: http://arxiv.org/abs/2404.06483v1
- Date: Tue, 9 Apr 2024 17:34:19 GMT
- Title: RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos
- Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma,
- Abstract summary: This paper introduces RhythmMamba, an end-to-end method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends.
Extensive experiments show that RhythmMamba state-the-art performance with reduced parameters and lower computational complexity.
- Score: 10.132660483466239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy and understanding the periodic patterns of rPPG among long contexts. This represents a trade-off between computational complexity and the ability to capture long-range dependencies, posing a challenge for rPPG that is suitable for deployment on mobile devices. Based on the in-depth exploration of Mamba's comprehension of spatial and temporal information, this paper introduces RhythmMamba, an end-to-end Mamba-based method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends, coupled with frequency domain feed-forward to enable Mamba to robustly understand the quasi-periodic patterns of rPPG. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with reduced parameters and lower computational complexity. The proposed RhythmMamba can be applied to video segments of any length without performance degradation. The codes are available at https://github.com/zizheng-guo/RhythmMamba.
Related papers
- PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba [20.435381963248787]
Previous deep learning based r measurement are primarily based on CNNs and Transformers.
We propose PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos.
Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba.
arXiv Detail & Related papers (2024-09-18T14:48:50Z) - PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks.
PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting [6.152779144421304]
We introduce a novel framework named FMamba for multivariate time-series forecasting (MTSF)
Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module.
We use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block)
Finally, FMamba obtains the predictive results through the projector, a linear layer.
arXiv Detail & Related papers (2024-07-20T09:14:05Z) - DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.
We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z) - SPMamba: State-space model is all you need in speech separation [20.168153319805665]
CNN-based speech separation models face local receptive field limitations and cannot effectively capture long time dependencies.
We introduce an innovative speech separation method called SPMamba.
This model builds upon the robust TF-GridNet architecture, replacing its traditional BLSTM modules with bidirectional Mamba modules.
arXiv Detail & Related papers (2024-04-02T16:04:31Z) - RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal
Periodic Transformer [17.751885452773983]
We propose a fully end-to-end transformer-based method for extracting r signals by explicitly leveraging the quasi-periodic nature of r periodicity.
A fusion stem is proposed to guide self-attention to r features effectively, and it can be easily transferred to existing methods to enhance their performance significantly.
arXiv Detail & Related papers (2024-02-20T07:56:02Z) - Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks.
Our Vivim can effectively compress the long-term representation into sequences at varying scales.
Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z) - No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention
and Zoom-in Boundary Detection [52.03562682785128]
Temporal video grounding aims to retrieve the time interval of a language query from an untrimmed video.
A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR.
We propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection.
arXiv Detail & Related papers (2023-07-20T04:12:10Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.