Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS
- URL: http://arxiv.org/abs/2407.19809v1
- Date: Mon, 29 Jul 2024 09:02:43 GMT
- Title: Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS
- Authors: Stefanos Gkikas, Manolis Tsiknakis,
- Abstract summary: This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)
The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models.
- Score: 0.9668407688201359
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Automatic pain assessment plays a critical role for advancing healthcare and optimizing pain management strategies. This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models. Employing a dual ViT configuration and adopting waveform representations for the fNIRS, as well as for the extracted embeddings from the two modalities, demonstrate the efficacy of the proposed method, achieving an accuracy of 46.76% in the multilevel pain assessment task.
Related papers
- PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks [15.497221591506625]
We have proposed PathVLM-R1, a visual language model designed specifically for pathological images.
We have based our model on Qwen2.5-VL-7B-Instruct and enhanced its performance for pathological tasks through meticulously designed post-training strategies.
arXiv Detail & Related papers (2025-04-12T15:32:16Z) - Improving Pain Classification using Spatio-Temporal Deep Learning Approaches with Facial Expressions [0.27309692684728604]
Pain management and severity detection are crucial for effective treatment.
Traditional self-reporting methods are subjective and may be unsuitable for non-verbal individuals.
We explore automated pain detection using facial expressions.
arXiv Detail & Related papers (2025-01-12T11:54:46Z) - MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report [4.340464264725625]
We introduce a novel Multi-Modal Contrastive Pre-training Framework that synergistically combines X-rays, electrocardiograms (ECGs) and radiology/cardiology reports.
We utilize LoRA-Peft to significantly reduce trainable parameters in the LLM and incorporate recent linear attention dropping strategy in the Vision Transformer(ViT) for smoother attention.
To the best of our knowledge, we are the first to propose an integrated model that combines X-ray, ECG, and Radiology/Cardiology Report with this approach.
arXiv Detail & Related papers (2024-10-21T17:42:41Z) - Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment [11.016004057765185]
We enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model.
By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions.
arXiv Detail & Related papers (2024-09-08T13:14:03Z) - Dual-Domain CLIP-Assisted Residual Optimization Perception Model for Metal Artifact Reduction [9.028901322902913]
Metal artifacts in computed tomography (CT) imaging pose significant challenges to accurate clinical diagnosis.
Deep learning-based approaches, particularly generative models, have been proposed for metal artifact reduction (MAR)
arXiv Detail & Related papers (2024-08-14T02:37:26Z) - CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation [60.61972883059688]
CriDiff is a two-stage feature injecting framework with a Crisscross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation.
To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS.
The GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters.
arXiv Detail & Related papers (2024-06-20T10:46:50Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - A Dual Branch Network for Emotional Reaction Intensity Estimation [12.677143408225167]
We propose a solution to the ERI challenge of the fifth Affective Behavior Analysis in-the-wild(ABAW), a dual-branch based multi-output regression model.
The spatial attention is used to better extract visual features, and the Mel-Frequency Cepstral Coefficients technology extracts acoustic features.
Our method achieves excellent results on the official validation set.
arXiv Detail & Related papers (2023-03-16T10:31:40Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - MIST GAN: Modality Imputation Using Style Transfer for MRI [0.49172272348627766]
We formulate generating the missing MR modality from existing MR modalities as an imputation problem using style transfer.
With a multiple-to-one mapping, we model a network that accommodates domain specific styles in generating the target image.
Our model is tested on the BraTS'18 dataset and the results are observed to be on par with the state-of-the-art in terms of visual metrics.
arXiv Detail & Related papers (2022-02-21T17:50:40Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Act Like a Radiologist: Towards Reliable Multi-view Correspondence
Reasoning for Mammogram Mass Detection [49.14070210387509]
We propose an Anatomy-aware Graph convolutional Network (AGN) for mammogram mass detection.
AGN is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability.
Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance.
arXiv Detail & Related papers (2021-05-21T06:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.