Related papers: PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing

PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing

URL: http://arxiv.org/abs/2505.03621v1
Date: Tue, 06 May 2025 15:18:38 GMT
Title: PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
Authors: Yiping Xie, Bo Zhao, Mingtong Dai, Jian-Ping Zhou, Yue Sun, Tao Tan, Weicheng Xie, Linlin Shen, Zitong Yu,
Abstract summary: Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
Score: 49.243031514520794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote photoplethysmography (rPPG) enables non-contact physiological measurement but remains highly susceptible to illumination changes, motion artifacts, and limited temporal modeling. Large Language Models (LLMs) excel at capturing long-range dependencies, offering a potential solution but struggle with the continuous, noise-sensitive nature of rPPG signals due to their text-centric design. To bridge this gap, we introduce PhysLLM, a collaborative optimization framework that synergizes LLMs with domain-specific rPPG components. Specifically, the Text Prototype Guidance (TPG) strategy is proposed to establish cross-modal alignment by projecting hemodynamic features into LLM-interpretable semantic space, effectively bridging the representational gap between physiological signals and linguistic tokens. Besides, a novel Dual-Domain Stationary (DDS) Algorithm is proposed for resolving signal instability through adaptive time-frequency domain feature re-weighting. Finally, rPPG task-specific cues systematically inject physiological priors through physiological statistics, environmental contextual answering, and task description, leveraging cross-modal learning to integrate both visual and textual information, enabling dynamic adaptation to challenging scenarios like variable illumination and subject movements. Evaluation on four benchmark datasets, PhysLLM achieves state-of-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.

Related papers

Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control [14.929720580977152]
This paper introduces a novel MARL framework that integrates Dynamic Graph Neural Networks (DGNNs) and Topological Data Analysis (TDA)<n>Inspired by the Mixture of Experts (MoE) architecture in Large Language Models (LLMs), a topology-assisted spatial pattern disentangling (TSD)-enhanced MoE is proposed.<n> Extensive experiments conducted on real-world traffic scenarios, together with comprehensive theoretical analysis, validate the superior performance of the proposed framework.
arXiv Detail & Related papers (2025-06-14T11:18:12Z)
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation [18.978031999678507]
A novel wavelet-based approach for physiological signal analysis is presented, aiming to capture multi-scale time-frequency features in various physiological signals.<n>Two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks.<n>A unified multi-modal framework is constructed by integrating pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion.
arXiv Detail & Related papers (2025-06-12T05:11:41Z)
Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization [7.947387272047604]
We present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous estimation of photoplethysmography (rRSP) and respiratory (rRSP) signals from multimodal video inputs.<n>We demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications.
arXiv Detail & Related papers (2025-05-11T15:20:45Z)
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities [9.785262633953794]
Physio Omni is a foundation model for multimodal physiological signal analysis.<n>It trains a decoupled multimodal tokenizer, enabling masked signal pre-training.<n>It achieves state-of-the-art performance while maintaining strong robustness to missing modalities.
arXiv Detail & Related papers (2025-04-28T09:00:04Z)
PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis [1.1619559582563954]
We propose PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA)<n> PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics.<n>The learnable graph structure can also denoise perturbations by learning cross-modal knowledge.
arXiv Detail & Related papers (2024-09-19T12:53:29Z)
Spatial Adaptation Layer: Interpretable Domain Adaptation For Biosignal Sensor Array Applications [0.7499722271664147]
We propose the Spatial Adaptation Layer (SAL), which can be applied to any biosignal array model.<n>We also introduce learnable baseline normalization (LBN) to reduce baseline fluctuations.<n>Tested on two HD-sEMG gesture recognition datasets, SAL and LBN outperformed standard fine-tuning on regular arrays.
arXiv Detail & Related papers (2024-09-12T14:06:12Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer [76.40106756572644]
Recent deep learning approaches focus on mining subtle clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose two end-to-end video transformer based on PhysFormer and Phys++++, to adaptively aggregate both local and global features for r representation enhancement. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra-temporal and cross-dataset testing.
arXiv Detail & Related papers (2023-02-07T15:56:03Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations. We then use the distilled physiological features for robust multi-task physiological measurements. The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.