Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization
- URL: http://arxiv.org/abs/2505.07013v1
- Date: Sun, 11 May 2025 15:20:45 GMT
- Title: Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization
- Authors: Jitesh Joshi, Youngjun Cho,
- Abstract summary: We present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous estimation of photoplethysmography (rRSP) and respiratory (rRSP) signals from multimodal video inputs.<n>We demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications.
- Score: 7.947387272047604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote physiological sensing using camera-based technologies offers transformative potential for non-invasive vital sign monitoring across healthcare and human-computer interaction domains. Although deep learning approaches have advanced the extraction of physiological signals from video data, existing methods have not been sufficiently assessed for their robustness to domain shifts. These shifts in remote physiological sensing include variations in ambient conditions, camera specifications, head movements, facial poses, and physiological states which often impact real-world performance significantly. Cross-dataset evaluation provides an objective measure to assess generalization capabilities across these domain shifts. We introduce Target Signal Constrained Factorization module (TSFM), a novel multidimensional attention mechanism that explicitly incorporates physiological signal characteristics as factorization constraints, allowing more precise feature extraction. Building on this innovation, we present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous multitask estimation of photoplethysmography (rPPG) and respiratory (rRSP) signals from multimodal RGB and thermal video inputs. Through comprehensive cross-dataset evaluation on five benchmark datasets, we demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rPPG and rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications. Our approach establishes new benchmarks for robust multitask and multimodal physiological sensing and offers a computationally efficient framework for practical deployment in unconstrained environments. The web browser-based application featuring on-device real-time inference of MMRPhys model is available at https://physiologicailab.github.io/mmrphys-live
Related papers
- PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z) - Event-Driven Implementation of a Physical Reservoir Computing Framework for superficial EMG-based Gesture Recognition [2.222098162797332]
This paper explores a novel neuromorphic implementation approach for gesture recognition by extracting spiking information from surface electromyography (sEMG) data in an event-driven manner.<n>The network was designed by implementing a simple-structured and hardware-friendly Physical Reservoir Computing framework called Rotating Neuron Reservoir (RNR) within the domain of Spiking neural network (SNN)<n>The proposed system was validated by an open-access large-scale sEMG database and achieved an average classification accuracy of 74.6% and 80.3%.
arXiv Detail & Related papers (2025-03-10T17:18:14Z) - FE-UNet: Frequency Domain Enhanced U-Net for Low-Frequency Information-Rich Image Segmentation [48.034848981295525]
We address the differences in frequency band sensitivity between CNNs and the human visual system.<n>We propose a wavelet adaptive spectrum fusion (WASF) method inspired by biological vision mechanisms to balance cross-frequency image features.<n>We develop the FE-UNet model, which employs a SAM2 backbone network and incorporates fine-tuned Hiera-Large modules to ensure segmentation accuracy.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Efficient Unsupervised Domain Adaptation Regression for Spatial-Temporal Sensor Fusion [6.963971634605796]
Low-cost, distributed sensor networks in environmental and biomedical domains have enabled continuous, large-scale health monitoring.<n>These systems often face challenges related to degraded data quality caused by sensor drift, noise, and insufficient calibration.<n>Traditional machine learning methods for sensor fusion and calibration rely on extensive feature engineering.<n>We propose a novel unsupervised domain adaptation (UDA) method tailored for regression tasks.
arXiv Detail & Related papers (2024-11-11T12:20:57Z) - FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing [10.81951503398909]
Factorized Self-Attention Module (FSAM) computes multidimensional attention from voxel embeddings using nonnegative matrix factorization.
Our approach adeptly factorizes voxel embeddings to achieve comprehensive spatial, temporal, and channel attention, enhancing performance of generic signal extraction.
FactorizePhys is an end-to-end 3D-CNN architecture for estimating blood volume pulse signals from raw video frames.
arXiv Detail & Related papers (2024-11-03T12:22:58Z) - PhysMamba: State Space Duality Model for Remote Physiological Measurement [18.423806804725032]
Remote Photoplethysmography (rBFC) enables non-contact physiological signal extraction from facial videos.<n>This work lays a strong foundation for practical applications in non-contact health monitoring, including real-time remote patient care.
arXiv Detail & Related papers (2024-08-02T07:52:28Z) - REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates [54.96885726053036]
This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis.
By leveraging a combination of graph neural networks and recurrent structures, REST efficiently captures both non-Euclidean geometry and temporal dependencies within EEG data.
Our model demonstrates high accuracy in both seizure detection and classification tasks.
arXiv Detail & Related papers (2024-06-03T16:30:19Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement.
Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Video-based Remote Physiological Measurement via Cross-verified Feature
Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations.
We then use the distilled physiological features for robust multi-task physiological measurements.
The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.