Related papers: Editing Physiological Signals in Videos Using Latent Representations

Editing Physiological Signals in Videos Using Latent Representations

URL: http://arxiv.org/abs/2509.25348v2
Date: Wed, 01 Oct 2025 01:16:13 GMT
Title: Editing Physiological Signals in Videos Using Latent Representations
Authors: Tianwen Zhou, Akshay Paruchuri, Josef Spjut, Kaan Akşit,
Abstract summary: Heart Rate (HR) is a non-contact means to monitor the health of an individual.<n>The presence of vital signals in facial videos raises significant privacy concerns.<n>We propose that edits physiological signals in videos while preserving visual fidelity.<n>Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or realistic videos with vital signs.
Score: 1.1688456044134343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camera-based physiological signal estimation provides a non-contact and convenient means to monitor Heart Rate (HR). However, the presence of vital signals in facial videos raises significant privacy concerns, as they can reveal sensitive personal information related to the health and emotional states of an individual. To address this, we propose a learned framework that edits physiological signals in videos while preserving visual fidelity. First, we encode an input video into a latent space via a pretrained 3D Variational Autoencoder (3D VAE), while a target HR prompt is embedded through a frozen text encoder. We fuse them using a set of trainable spatio-temporal layers with Adaptive Layer Normalizations (AdaLN) to capture the strong temporal coherence of remote Photoplethysmography (rPPG) signals. We apply Feature-wise Linear Modulation (FiLM) in the decoder with a fine-tuned output layer to avoid the degradation of physiological signals during reconstruction, enabling accurate physiological modulation in the reconstructed video. Empirical results show that our method preserves visual quality with an average PSNR of 38.96 dB and SSIM of 0.98 on selected datasets, while achieving an average HR modulation error of 10.00 bpm MAE and 10.09% MAPE using a state-of-the-art rPPG estimator. Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or synthesizing realistic videos with desired vital signs.

Related papers

Label-free Motion-Conditioned Diffusion Model for Cardiac Ultrasound Synthesis [13.306765004903118]
We propose the Motion Conditioned Diffusion Model (MCDM), a label-free latent diffusion framework that synthesises realistic echocardiography videos conditioned on self-supervised motion features.<n>MCDM achieves competitive video generation performance, producing temporally coherent and clinically realistic sequences without reliance on manual labels.
arXiv Detail & Related papers (2025-12-10T08:32:34Z)
Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation [6.32655874508904]
We propose a method that learns a general representation of periodic signals from unlabeled facial videos by capturing subtle changes in skin tone over time.<n>We evaluate the proposed method on the PURE, U-BFCr, MMPD, and V-BFC4V datasets.<n>Our results demonstrate significant performance improvements, particularly in challenging cross-dataset evaluations.
arXiv Detail & Related papers (2025-06-27T02:18:10Z)
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z)
CodePhys: Robust Video-based Remote Physiological Measurement through Latent Codebook Querying [26.97093819822487]
Remote photoplethysmography aims to measure non-contact physiological signals from facial videos.<n>Most existing methods directly extract video-based r features by designing neural networks for heart rate estimation.<n>Recent methods are easily affected by interference and degradation, resulting in noisy r signals.<n>We propose a novel method named CodePhys, which innovatively treats r measurement as a code task in a noise-free proxy space.
arXiv Detail & Related papers (2025-02-11T13:05:42Z)
HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion [50.02316409061741]
HuGDiffusion is a learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images.<n>We aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image.<n>Our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.
arXiv Detail & Related papers (2025-01-25T01:00:33Z)
Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast [22.742875409103164]
We propose Contrast-Phys+, a method that can be trained in both unsupervised and unsupervised settings. We employ a 3DCNN model to generate multiple rtemporal signals and incorporate prior knowledge of r into a contrastive loss function. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals.
arXiv Detail & Related papers (2023-09-13T12:50:21Z)
Facial Video-based Remote Physiological Measurement via Self-supervised Learning [9.99375728024877]
We introduce a novel framework that learns to estimate r signals from facial videos without the need of ground truth signals. Negative samples are generated via a learnable frequency module, which performs nonlinear signal frequency transformation. Next, we introduce a local r expert aggregation module to estimate r signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregate them into one r prediction.
arXiv Detail & Related papers (2022-10-27T13:03:23Z)
Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling [121.50704279659253]
We propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations. We then use the distilled physiological features for robust multi-task physiological measurements. The disentangled features are finally used for the joint prediction of multiple physiological signals like average HR values and r signals.
arXiv Detail & Related papers (2020-07-16T09:39:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.