Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment
- URL: http://arxiv.org/abs/2409.05088v3
- Date: Mon, 30 Sep 2024 04:35:19 GMT
- Title: Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment
- Authors: Minh-Duc Nguyen, Hyung-Jeong Yang, Soo-Hyung Kim, Ji-Eun Shin, Seung-Won Kim,
- Abstract summary: We enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model.
By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions.
- Score: 11.016004057765185
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurate pain assessment is crucial in healthcare for effective diagnosis and treatment; however, traditional methods relying on self-reporting are inadequate for populations unable to communicate their pain. Cutting-edge AI is promising for supporting clinicians in pain recognition using facial video data. In this paper, we enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model. By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions. We conducted our experiment on the AI4Pain dataset, which produced promising results that pave the way for innovative healthcare solutions that are both comprehensive and objective.
Related papers
- Improving Pain Classification using Spatio-Temporal Deep Learning Approaches with Facial Expressions [0.27309692684728604]
Pain management and severity detection are crucial for effective treatment.
Traditional self-reporting methods are subjective and may be unsuitable for non-verbal individuals.
We explore automated pain detection using facial expressions.
arXiv Detail & Related papers (2025-01-12T11:54:46Z) - A Full Transformer-based Framework for Automatic Pain Estimation using Videos [0.9668407688201359]
We present a novel full transformer-based framework consisting of a Transformer in Transformer (TNT) model and a Transformer leveraging cross-attention and self-attention blocks.
We demonstrate state-of-the-art performances, showing the efficacy, efficiency, and generalization capability across all the primary pain estimation tasks.
arXiv Detail & Related papers (2024-12-19T17:45:08Z) - Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints [11.515273901289472]
This study introduces a novel approach that leverages synthetic data to enhance video-based pain recognition models.
We present a pipeline that synthesizes realistic 3D facial models by capturing nuanced facial movements from a small participant pool.
This process generates 8,600 synthetic faces, accurately reflecting genuine pain expressions from varied angles and perspectives.
arXiv Detail & Related papers (2024-09-24T18:33:57Z) - MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder [26.830574964308962]
We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis.
We explore MAEs for zero-shot learning with crossed domains, which enhances the model's ability to learn from limited data.
Lastly, we validate using language will improve the zero-shot performance for the medical image analysis.
arXiv Detail & Related papers (2024-03-07T16:11:43Z) - Pain Analysis using Adaptive Hierarchical Spatiotemporal Dynamic Imaging [16.146223377936035]
We introduce the Adaptive temporal Dynamic Image (AHDI) technique.
AHDI encodes deep changes in facial videos into singular RGB image, permitting application simpler 2D models for video representation.
Within this framework, we employ a residual network to derive generalized facial representations.
These representations are optimized for two tasks: estimating pain intensity and differentiating between genuine and simulated pain expressions.
arXiv Detail & Related papers (2023-12-12T01:23:05Z) - Automatic diagnosis of knee osteoarthritis severity using Swin
transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint.
We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Non-contact Pain Recognition from Video Sequences with Remote
Physiological Measurements Prediction [53.03469655641418]
We present a novel multi-task learning framework which encodes both appearance changes and physiological cues in a non-contact manner for pain recognition.
We establish the state-of-the-art performance of non-contact pain recognition on publicly available pain databases.
arXiv Detail & Related papers (2021-05-18T20:47:45Z) - One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games.
The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions.
We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.