Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment
- URL: http://arxiv.org/abs/2409.05088v3
- Date: Mon, 30 Sep 2024 04:35:19 GMT
- Title: Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment
- Authors: Minh-Duc Nguyen, Hyung-Jeong Yang, Soo-Hyung Kim, Ji-Eun Shin, Seung-Won Kim,
- Abstract summary: We enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model.
By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions.
- Score: 11.016004057765185
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurate pain assessment is crucial in healthcare for effective diagnosis and treatment; however, traditional methods relying on self-reporting are inadequate for populations unable to communicate their pain. Cutting-edge AI is promising for supporting clinicians in pain recognition using facial video data. In this paper, we enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model. By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions. We conducted our experiment on the AI4Pain dataset, which produced promising results that pave the way for innovative healthcare solutions that are both comprehensive and objective.
Related papers
- Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints [11.515273901289472]
This study introduces a novel approach that leverages synthetic data to enhance video-based pain recognition models.
We present a pipeline that synthesizes realistic 3D facial models by capturing nuanced facial movements from a small participant pool.
This process generates 8,600 synthetic faces, accurately reflecting genuine pain expressions from varied angles and perspectives.
arXiv Detail & Related papers (2024-09-24T18:33:57Z) - MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder [26.830574964308962]
We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis.
We explore MAEs for zero-shot learning with crossed domains, which enhances the model's ability to learn from limited data.
Lastly, we validate using language will improve the zero-shot performance for the medical image analysis.
arXiv Detail & Related papers (2024-03-07T16:11:43Z) - Pain Analysis using Adaptive Hierarchical Spatiotemporal Dynamic Imaging [16.146223377936035]
We introduce the Adaptive temporal Dynamic Image (AHDI) technique.
AHDI encodes deep changes in facial videos into singular RGB image, permitting application simpler 2D models for video representation.
Within this framework, we employ a residual network to derive generalized facial representations.
These representations are optimized for two tasks: estimating pain intensity and differentiating between genuine and simulated pain expressions.
arXiv Detail & Related papers (2023-12-12T01:23:05Z) - Automatic diagnosis of knee osteoarthritis severity using Swin
transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint.
We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Transformer Encoder with Multiscale Deep Learning for Pain
Classification Using Physiological Signals [0.0]
Pain is a subjective sensation-driven experience.
Traditional techniques for measuring pain intensity are susceptible to bias and unreliable in some instances.
We develop PainAttnNet, a novel transfomer-encoder deep-learning framework for classifying pain intensities with physiological signals as input.
arXiv Detail & Related papers (2023-03-13T04:21:33Z) - Pain Detection in Masked Faces during Procedural Sedation [0.0]
Pain monitoring is essential to the quality of care for patients undergoing a medical procedure with sedation.
Previous studies have shown the viability of computer vision methods in detecting pain in unoccluded faces.
This study has collected video data from masked faces of 14 patients undergoing procedures in an interventional radiology department.
arXiv Detail & Related papers (2022-11-12T15:55:33Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Non-contact Pain Recognition from Video Sequences with Remote
Physiological Measurements Prediction [53.03469655641418]
We present a novel multi-task learning framework which encodes both appearance changes and physiological cues in a non-contact manner for pain recognition.
We establish the state-of-the-art performance of non-contact pain recognition on publicly available pain databases.
arXiv Detail & Related papers (2021-05-18T20:47:45Z) - One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games.
The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions.
We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.