Related papers: The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation

Related papers

UniSurg: A Video-Native Foundation Model for Universal Understanding of Surgical Videos [81.9180187964947]
We present UniSurg, a foundation model that shifts the learning paradigm from pixel-level reconstruction to latent motion prediction.<n>To enable large-scale pretraining, we curate the largest surgical video dataset to date, comprising 3,658 hours of video from 50 sources across 13 anatomical regions.<n>These results establish UniSurg as a new standard for universal, motion-oriented surgical video understanding.
arXiv Detail & Related papers (2026-02-05T13:18:33Z)
TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models [54.48710348910535]
Existing medical reasoning benchmarks primarily focus on analyzing a patient's condition based on an image from a single visit.<n>We introduce TemMed-Bench, the first benchmark designed for analyzing changes in patients' conditions between different clinical visits.
arXiv Detail & Related papers (2025-09-29T17:51:26Z)
Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation [36.002060195915526]
The paper introduces a novel large-scale multi-view video dataset for r and health estimation.<n>Our dataset comprises synchronized video recordings from 600 subjects, captured under varied conditions.<n>The public release of our dataset and model should significantly speed up the progress in the development of AI medical assistants.
arXiv Detail & Related papers (2025-08-25T11:46:40Z)
Non-Contact Health Monitoring During Daily Personal Care Routines [33.93756501373886]
Remote photoplethysmography (r) enables non-contact, continuous monitoring of physiological signals.<n>We present the first long-term r learning dataset containing 240 synchronized RGB and infrared (IR) facial videos from 21 participants.<n>Experiments demonstrate that combining RGB and IR video inputs improves the accuracy and robustness of non-contact physiological monitoring.
arXiv Detail & Related papers (2025-06-11T13:29:21Z)
Generalization of Video-Based Heart Rate Estimation Methods To Low Illumination and Elevated Heart Rates [3.8886059978578595]
Heart rate is a physiological signal that provides information about an individual's health and affective state. We evaluate representative state-of-the-art methods for estimation of heart rate using remote photoplethysmography (r) Our experimental results indicate that classical methods are not significantly impacted by low-light conditions. Some deep learning methods were found to be more robust to changes in lighting conditions but encountered challenges in estimating high heart rates.
arXiv Detail & Related papers (2025-03-11T18:29:10Z)
FFA Sora, video generation as fundus fluorescein angiography simulator [23.08083653969291]
Fundus fluorescein angiography (FFA) is critical for diagnosing retinal vascular diseases. This study develops FFA Sora, a text-to-video model that converts FFA reports into dynamic videos.
arXiv Detail & Related papers (2024-12-23T07:18:13Z)
FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging [68.6715007665896]
FedMedICL is a unified framework and benchmark to holistically evaluate federated medical imaging challenges. We comprehensively evaluate several popular methods on six diverse medical imaging datasets. We find that a simple batch balancing technique surpasses advanced methods in average performance across FedMedICL experiments.
arXiv Detail & Related papers (2024-07-11T19:12:23Z)
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement [26.480515954528848]
We propose a novel framework that successfully integrates popular vision-language models into a remote physiological measurement task. We develop a series of generative and contrastive learning mechanisms to optimize the framework. Our method for the first time adapts VLMs to digest and align the frequency-related knowledge in vision and text modalities.
arXiv Detail & Related papers (2024-07-11T13:45:50Z)
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals [4.519437028632205]
Deep learning has facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation. We introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios.
arXiv Detail & Related papers (2024-06-13T08:58:59Z)
Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling [49.52787013516891]
Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.
arXiv Detail & Related papers (2024-05-14T17:15:28Z)
Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis. Our evaluation encompasses 17 human body systems. GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy. It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z)
Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG [2.82697733014759]
r (pg photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. This study is to provide a framework to evaluate various r benchmarking techniques across a wide range of datasets for fair evaluation and comparison.
arXiv Detail & Related papers (2023-07-24T09:35:47Z)
Camera-Based HRV Prediction for Remote Learning Environments [4.074837550066978]
Restoring blood volume pulse signals from facial videos is a challenging task that involves a series of preprocessing, image algorithms, and postprocessing to restore waveforms. The challenge in obtaining HRV indices through r is the necessity for algorithms to precisely predict the BVP peak positions. In this paper, we collected the Remote Learning Affect and Physiology (RLAP) dataset, which includes over 32 hours of highly synchronized video and labels from 58 subjects. Using the RLAP dataset, we trained a new model called Seq-r, it is a model based on one-dimensional convolution, and experimental results reveal
arXiv Detail & Related papers (2023-05-07T02:26:00Z)
Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z)
Remote Medication Status Prediction for Individuals with Parkinson's Disease using Time-series Data from Smartphones [75.23250968928578]
We present a method for predicting the medication status of Parkinson's disease patients using the public mPower dataset. The proposed method shows promising results in predicting three medication statuses objectively.
arXiv Detail & Related papers (2022-07-26T02:08:08Z)
FetReg2021: A Challenge on Placental Vessel Segmentation and Registration in Fetoscopy [52.3219875147181]
Fetoscopic laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS) The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fet
arXiv Detail & Related papers (2022-06-24T23:44:42Z)
Visual Acuity Prediction on Real-Life Patient Data Using a Machine Learning Based Multistage System [0.40151799356083057]
The prediction of the visual acuity (VA) and the earliest possible detection of deterioration under real-life conditions is challenging due to heterogeneous and incomplete data. We present a workflow for the development of a research-compatible data corpus fusing different IT systems of the department of ophthalmology of a German maximum care hospital. We achieve a final prediction accuracy of 69 % in macro average F1-score, while being in the same range as the ophthalmologists with 57.8 and 50 +- 10.7 % F1-score.
arXiv Detail & Related papers (2022-04-25T21:20:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.