Related papers: RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

URL: http://arxiv.org/abs/2404.06483v1
Date: Tue, 9 Apr 2024 17:34:19 GMT
Title: RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos
Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma,
Abstract summary: This paper introduces RhythmMamba, an end-to-end method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends. Extensive experiments show that RhythmMamba state-the-art performance with reduced parameters and lower computational complexity.
Score: 10.132660483466239
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy and understanding the periodic patterns of rPPG among long contexts. This represents a trade-off between computational complexity and the ability to capture long-range dependencies, posing a challenge for rPPG that is suitable for deployment on mobile devices. Based on the in-depth exploration of Mamba's comprehension of spatial and temporal information, this paper introduces RhythmMamba, an end-to-end Mamba-based method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends, coupled with frequency domain feed-forward to enable Mamba to robustly understand the quasi-periodic patterns of rPPG. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with reduced parameters and lower computational complexity. The proposed RhythmMamba can be applied to video segments of any length without performance degradation. The codes are available at https://github.com/zizheng-guo/RhythmMamba.

Related papers

DynSTG-Mamba: Dynamic Spatio-Temporal Graph Mamba with Cross-Graph Knowledge Distillation for Gait Disorders Recognition [1.7519167857253402]
DynTG-Mamba is a novel framework that combines DF-STGNN and STG-Mamba to enhance motion modeling. DF-STGNN incorporates a dynamic spatial filter that adaptively adjusts between skeletal joints and temporal interactions. STG-Mamba, an extension of Mamba, ensures a continuous propagation of states while reducing computational costs.
arXiv Detail & Related papers (2025-03-17T13:26:47Z)
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba [20.435381963248787]
Previous deep learning based r measurement are primarily based on CNNs and Transformers. We propose PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba.
arXiv Detail & Related papers (2024-09-18T14:48:50Z)
PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks. PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting [6.152779144421304]
We introduce a novel framework named FMamba for multivariate time-series forecasting (MTSF) Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module. We use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block) Finally, FMamba obtains the predictive results through the projector, a linear layer.
arXiv Detail & Related papers (2024-07-20T09:14:05Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba. We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification [13.110156202816112]
We propose a novel multi-view approach to capture patterns with properties like shift equivariance. Our method integrates diverse features, including spectral, temporal, local, and global features, to obtain rich, complementary contexts for TSC. Our approach achieves average accuracy improvements of 4.01-6.45% and 7.93% respectively, over leading TSC models.
arXiv Detail & Related papers (2024-06-06T18:05:10Z)
SpectralMamba: Efficient Mamba for Hyperspectral Image Classification [39.18999103115206]
Recurrent neural networks and Transformers have dominated most applications in hyperspectral (HS) imaging. We propose SpectralMamba -- a novel state space model incorporated efficient deep learning framework for HS image classification. We show that SpectralMamba surprisingly creates promising win-wins from both performance and efficiency perspectives.
arXiv Detail & Related papers (2024-04-12T14:12:03Z)
SPMamba: State-space model is all you need in speech separation [20.168153319805665]
CNN-based speech separation models face local receptive field limitations and cannot effectively capture long time dependencies. We introduce an innovative speech separation method called SPMamba. This model builds upon the robust TF-GridNet architecture, replacing its traditional BLSTM modules with bidirectional Mamba modules.
arXiv Detail & Related papers (2024-04-02T16:04:31Z)
RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal Periodic Transformer [17.751885452773983]
We propose a fully end-to-end transformer-based method for extracting r signals by explicitly leveraging the quasi-periodic nature of r periodicity. A fusion stem is proposed to guide self-attention to r features effectively, and it can be easily transferred to existing methods to enhance their performance significantly.
arXiv Detail & Related papers (2024-02-20T07:56:02Z)
Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks. Our Vivim can effectively compress the long-term representation into sequences at varying scales. Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z)
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection [52.03562682785128]
Temporal video grounding aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. We propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection.
arXiv Detail & Related papers (2023-07-20T04:12:10Z)
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z)
HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition. The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations. We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z)
Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action. Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features. We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision. We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities. 90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z)
Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions. The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
Temporal Pyramid Network for Action Recognition [129.12076009042622]
We propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks. TPN shows consistent improvements over other challenging baselines on several action recognition datasets.
arXiv Detail & Related papers (2020-04-07T17:17:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.