A Full Transformer-based Framework for Automatic Pain Estimation using Videos
- URL: http://arxiv.org/abs/2412.15095v1
- Date: Thu, 19 Dec 2024 17:45:08 GMT
- Title: A Full Transformer-based Framework for Automatic Pain Estimation using Videos
- Authors: Stefanos Gkikas, Manolis Tsiknakis,
- Abstract summary: We present a novel full transformer-based framework consisting of a Transformer in Transformer (TNT) model and a Transformer leveraging cross-attention and self-attention blocks.
We demonstrate state-of-the-art performances, showing the efficacy, efficiency, and generalization capability across all the primary pain estimation tasks.
- Score: 0.9668407688201359
- License:
- Abstract: The automatic estimation of pain is essential in designing an optimal pain management system offering reliable assessment and reducing the suffering of patients. In this study, we present a novel full transformer-based framework consisting of a Transformer in Transformer (TNT) model and a Transformer leveraging cross-attention and self-attention blocks. Elaborating on videos from the BioVid database, we demonstrate state-of-the-art performances, showing the efficacy, efficiency, and generalization capability across all the primary pain estimation tasks.
Related papers
- Transformer with Leveraged Masked Autoencoder for video-based Pain Assessment [11.016004057765185]
We enhance pain recognition by employing facial video analysis within a Transformer-based deep learning model.
By combining a powerful Masked Autoencoder with a Transformers-based classifier, our model effectively captures pain level indicators through both expressions and micro-expressions.
arXiv Detail & Related papers (2024-09-08T13:14:03Z) - Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture [0.9668407688201359]
This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline.
A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings.
arXiv Detail & Related papers (2024-07-29T09:04:11Z) - D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on
transformer for assessment of patient physical rehabilitation [0.3626013617212666]
This paper introduces a new graph-based model for assessing rehabilitation exercises.
Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs.
The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential.
arXiv Detail & Related papers (2023-12-21T00:38:31Z) - Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation [73.31524865643709]
We present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D pose estimation from videos.
Our HoDT begins with pruning pose tokens of redundant frames and ends with recovering full-length tokens, resulting in a few pose tokens in the intermediate transformer blocks.
Our method can achieve both high efficiency and estimation accuracy compared to the original VPT models.
arXiv Detail & Related papers (2023-11-20T18:59:51Z) - DA-TransUNet: Integrating Spatial and Channel Dual Attention with
Transformer U-Net for Medical Image Segmentation [5.5582646801199225]
This study proposes a novel deep medical image segmentation framework, called DA-TransUNet.
It aims to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture.
Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features.
arXiv Detail & Related papers (2023-10-19T08:25:03Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Transformer Lesion Tracker [12.066026343488453]
We propose a transformer-based approach, termed Transformer Lesion Tracker (TLT)
We design a Cross Attention-based Transformer (CAT) to capture and combine both global and local information to enhance feature extraction.
We conduct experiments on a public dataset to show the superiority of our method and find that our model performance has improved the average Euclidean center error by at least 14.3%.
arXiv Detail & Related papers (2022-06-13T15:35:24Z) - Can Transformers be Strong Treatment Effect Estimators? [86.32484218657166]
We develop a general framework based on the Transformer architecture to address a variety of treatment effect estimation problems.
Our methods are applied to discrete, continuous, structured, or dosage-associated treatments.
Our experiments with Transformers as Treatment Effect Estimators (TransTEE) demonstrate that these inductive biases are also effective on the sorts of estimation problems and datasets that arise in research aimed at estimating causal effects.
arXiv Detail & Related papers (2022-02-02T23:56:42Z) - Transformers in Medical Imaging: A Survey [88.03790310594533]
Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results.
Medical imaging has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields.
We provide a review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues.
arXiv Detail & Related papers (2022-01-24T18:50:18Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.