Related papers: PainFormer: a Vision Foundation Model for Automatic Pain Assessment

PainFormer: a Vision Foundation Model for Automatic Pain Assessment

URL: http://arxiv.org/abs/2505.01571v2
Date: Sun, 18 May 2025 22:29:18 GMT
Title: PainFormer: a Vision Foundation Model for Automatic Pain Assessment
Authors: Stefanos Gkikas, Raul Fernandez Rojas, Manolis Tsiknakis,
Abstract summary: Pain is a manifold condition that impacts a significant percentage of the population.<n>This study introduces PainFormer, a vision foundation model based on multi-task learning principles.<n>PainFormer effectively extracts high-quality embeddings from diverse input modalities.
Score: 2.8028723950211476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pain is a manifold condition that impacts a significant percentage of the population. Accurate and reliable pain evaluation for the people suffering is crucial to developing effective and advanced pain management protocols. Automatic pain assessment systems provide continuous monitoring and support decision-making processes, ultimately aiming to alleviate distress and prevent functionality decline. This study introduces PainFormer, a vision foundation model based on multi-task learning principles trained simultaneously on 14 tasks/datasets with a total of 10.9 million samples. Functioning as an embedding extractor for various input modalities, the foundation model provides feature representations to the Embedding-Mixer, a transformer-based module that performs the final pain assessment. Extensive experiments employing behavioral modalities-including RGB, synthetic thermal, and estimated depth videos-and physiological modalities such as ECG, EMG, GSR, and fNIRS revealed that PainFormer effectively extracts high-quality embeddings from diverse input modalities. The proposed framework is evaluated on two pain datasets, BioVid and AI4Pain, and directly compared to 75 different methodologies documented in the literature. Experiments conducted in unimodal and multimodal settings demonstrate state-of-the-art performances across modalities and pave the way toward general-purpose models for automatic pain assessment.

Related papers

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis [0.8602553195689511]
This study has been submitted to the textitSecond Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN).<n>The proposed approach introduces textitTiny-BioMoE, a lightweight pretrained embedding model for biosignal analysis.
arXiv Detail & Related papers (2025-07-29T14:46:39Z)
SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation [13.672776832197918]
Multimodal large language models (MLLMs) have made significant strides, yet they face challenges in the medical domain due to limited specialized knowledge. We seek to address this gap at various stages of the end-to-end learning pipeline, including data collection, model fine-tuning, and evaluation.
arXiv Detail & Related papers (2024-10-19T02:35:35Z)
Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS [0.9668407688201359]
This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN) The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models.
arXiv Detail & Related papers (2024-07-29T09:02:43Z)
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z)
Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z)
Pain Analysis using Adaptive Hierarchical Spatiotemporal Dynamic Imaging [16.146223377936035]
We introduce the Adaptive temporal Dynamic Image (AHDI) technique. AHDI encodes deep changes in facial videos into singular RGB image, permitting application simpler 2D models for video representation. Within this framework, we employ a residual network to derive generalized facial representations. These representations are optimized for two tasks: estimating pain intensity and differentiating between genuine and simulated pain expressions.
arXiv Detail & Related papers (2023-12-12T01:23:05Z)
Wearable-based Fair and Accurate Pain Assessment Using Multi-Attribute Fairness Loss in Convolutional Neural Networks [4.451479907610764]
The adoption of AI in clinical pain evaluation is hindered by challenges like personalization and fairness.<n>Many AI models, including machine and deep learning, exhibit biases, discriminating against specific groups based on gender or ethnicity.<n>This paper proposes a Multi-attribute Fairness Loss (MAFL) based Convolutional Neural Network (CNN) model designed to account for protected attributes in data.
arXiv Detail & Related papers (2023-07-03T09:21:36Z)
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z)
Pain level and pain-related behaviour classification using GRU-based sparsely-connected RNNs [61.080598804629375]
People with chronic pain unconsciously adapt specific body movements to protect themselves from injury or additional pain. Because there is no dedicated benchmark database to analyse this correlation, we considered one of the specific circumstances that potentially influence a person's biometrics during daily activities. We proposed a sparsely-connected recurrent neural networks (s-RNNs) ensemble with the gated recurrent unit (GRU) that incorporates multiple autoencoders. We conducted several experiments which indicate that the proposed method outperforms the state-of-the-art approaches in classifying both pain level and pain-related behaviour.
arXiv Detail & Related papers (2022-12-20T12:56:28Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Towards dynamic multi-modal phenotyping using chest radiographs and physiological data [3.11179491890629]
We propose a dynamic training approach to learn modality-specific data representations and to integrate auxiliary features. Preliminary experiments results for a patient phenotyping task using physiological data in MIMIC-IV & chest radiographs in the MIMIC- CXR dataset. This illustrates the benefit of leveraging the chest imaging modality in the phenotyping task and highlights the potential of multi-modal learning in medical applications.
arXiv Detail & Related papers (2021-11-04T09:41:00Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.