FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing
- URL: http://arxiv.org/abs/2411.01542v1
- Date: Sun, 03 Nov 2024 12:22:58 GMT
- Title: FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing
- Authors: Jitesh Joshi, Sos S. Agaian, Youngjun Cho,
- Abstract summary: Factorized Self-Attention Module (FSAM) computes multidimensional attention from voxel embeddings using nonnegative matrix factorization.
Our approach adeptly factorizes voxel embeddings to achieve comprehensive spatial, temporal, and channel attention, enhancing performance of generic signal extraction.
FactorizePhys is an end-to-end 3D-CNN architecture for estimating blood volume pulse signals from raw video frames.
- Score: 10.81951503398909
- License:
- Abstract: Remote photoplethysmography (rPPG) enables non-invasive extraction of blood volume pulse signals through imaging, transforming spatial-temporal data into time series signals. Advances in end-to-end rPPG approaches have focused on this transformation where attention mechanisms are crucial for feature extraction. However, existing methods compute attention disjointly across spatial, temporal, and channel dimensions. Here, we propose the Factorized Self-Attention Module (FSAM), which jointly computes multidimensional attention from voxel embeddings using nonnegative matrix factorization. To demonstrate FSAM's effectiveness, we developed FactorizePhys, an end-to-end 3D-CNN architecture for estimating blood volume pulse signals from raw video frames. Our approach adeptly factorizes voxel embeddings to achieve comprehensive spatial, temporal, and channel attention, enhancing performance of generic signal extraction tasks. Furthermore, we deploy FSAM within an existing 2D-CNN-based rPPG architecture to illustrate its versatility. FSAM and FactorizePhys are thoroughly evaluated against state-of-the-art rPPG methods, each representing different types of architecture and attention mechanism. We perform ablation studies to investigate the architectural decisions and hyperparameters of FSAM. Experiments on four publicly available datasets and intuitive visualization of learned spatial-temporal features substantiate the effectiveness of FSAM and enhanced cross-dataset generalization in estimating rPPG signals, suggesting its broader potential as a multidimensional attention mechanism. The code is accessible at https://github.com/PhysiologicAILab/FactorizePhys.
Related papers
- Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography [5.743550396843244]
MAR-r is a framework that integrates the impact of ROI localization and complex motion artifacts.
MAR-r employs a masked attention regularization mechanism into the r field to capture semantic consistency of facial clips.
It also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance.
arXiv Detail & Related papers (2024-07-09T08:25:30Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - MDFL: Multi-domain Diffusion-driven Feature Learning [19.298491870280213]
We present a multi-domain diffusion-driven feature learning network (MDFL)
MDFL redefines the effective information domain that the model really focuses on.
We demonstrate that MDFL significantly improves the feature extraction performance of high-dimensional data.
arXiv Detail & Related papers (2023-11-16T02:55:21Z) - Rethinking Superpixel Segmentation from Biologically Inspired Mechanisms [8.24963839394421]
We propose a network architecture comprising an Enhanced Screening Module (ESM) and a novel Boundary-Aware Label (BAL) for superpixel segmentation.
The ESM enhances semantic information by simulating the interactive projection mechanisms of the visual cortex.
The BAL emulates the spatial frequency characteristics of visual cortical cells to facilitate the generation of superpixels with strong boundary adherence.
arXiv Detail & Related papers (2023-09-23T17:29:38Z) - Fuzzy Attention Neural Network to Tackle Discontinuity in Airway
Segmentation [67.19443246236048]
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases.
Some small-sized airway branches (e.g., bronchus and terminaloles) significantly aggravate the difficulty of automatic segmentation.
This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function.
arXiv Detail & Related papers (2022-09-05T16:38:13Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image
Reconstruction [127.20208645280438]
Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement.
Modeling the inter-spectra interactions is beneficial for HSI reconstruction.
Mask-guided Spectral-wise Transformer (MST) proposes a novel framework for HSI reconstruction.
arXiv Detail & Related papers (2021-11-15T16:59:48Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Non-contact PPG Signal and Heart Rate Estimation with Multi-hierarchical
Convolutional Network [12.119293125608976]
Heart rate (HR) are important physiological parameters of the human body.
This study presents an efficient multi-archhierical- convolutional network that can estimate HR from face video clips.
arXiv Detail & Related papers (2021-04-06T03:04:27Z) - Unsupervised Instance Segmentation in Microscopy Images via Panoptic
Domain Adaptation and Task Re-weighting [86.33696045574692]
We propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images.
We first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images.
Secondly, a semantic branch with a domain discriminator is designed to achieve panoptic-level domain adaptation.
arXiv Detail & Related papers (2020-05-05T11:08:26Z) - Salient Object Detection Combining a Self-attention Module and a Feature
Pyramid Network [10.81245352773775]
We propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy.
In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model.
arXiv Detail & Related papers (2020-04-30T03:08:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.