Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images
- URL: http://arxiv.org/abs/2501.18453v1
- Date: Thu, 30 Jan 2025 16:05:40 GMT
- Title: Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images
- Authors: Wei-Lun Chen, Chia-Yeh Hsieh, Yu-Hsiang Kao, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao,
- Abstract summary: This study presents a novel approach to human keypoint detection in low-resolution thermal images using transfer learning techniques.
We introduce the first application of the Timed Up and Go (TUG) test in thermal image computer vision.
- Score: 13.445499725722438
- License:
- Abstract: This study presents a novel approach to human keypoint detection in low-resolution thermal images using transfer learning techniques. We introduce the first application of the Timed Up and Go (TUG) test in thermal image computer vision, establishing a new paradigm for mobility assessment. Our method leverages a MobileNetV3-Small encoder and a ViTPose decoder, trained using a composite loss function that balances latent representation alignment and heatmap accuracy. The model was evaluated using the Object Keypoint Similarity (OKS) metric from the COCO Keypoint Detection Challenge. The proposed model achieves better performance with AP, AP50, and AP75 scores of 0.861, 0.942, and 0.887 respectively, outperforming traditional supervised learning approaches like Mask R-CNN and ViTPose-Base. Moreover, our model demonstrates superior computational efficiency in terms of parameter count and FLOPS. This research lays a solid foundation for future clinical applications of thermal imaging in mobility assessment and rehabilitation monitoring.
Related papers
- vHeat: Building Vision Models upon Heat Conduction [63.00030330898876]
vHeat is a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field.
The essential idea is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy.
arXiv Detail & Related papers (2024-05-26T12:58:04Z) - InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images [11.916941756499435]
In this paper, we explore the intricate task of incremental few-shot object detection in remote sensing images.
We introduce a pioneering fine-tuning-based technique, termed InfRS, designed to facilitate the incremental learning of novel classes.
We develop a prototypical calibration strategy based on the Wasserstein distance to mitigate the catastrophic forgetting problem.
arXiv Detail & Related papers (2024-05-18T13:39:50Z) - Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation [71.24808323646167]
We propose textbfDiffusionPose, a new scheme for learning keypoints heatmaps by a neural network.
During training, the keypoints are diffused to random distribution by adding noises and the diffusion model learns to recover ground-truth heatmaps from noised heatmaps.
Experiments show the prowess of our scheme with improvements of 1.6, 1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets.
arXiv Detail & Related papers (2023-06-29T16:24:32Z) - Pit-Pattern Classification of Colorectal Cancer Polyps Using a Hyper
Sensitive Vision-Based Tactile Sensor and Dilated Residual Networks [4.056583163276972]
We propose utilizing a hyper-sensitive vision-based tactile sensor called HySenSe and a complementary and novel machine learning architecture.
The proposed architecture was compared with the state-of-the-art ML models (e.g., AlexNet and DenseNet) and proved to be superior in terms of performance and complexity.
arXiv Detail & Related papers (2022-11-13T04:42:10Z) - Lightweight Human Pose Estimation Using Heatmap-Weighting Loss [7.830376406370752]
We introduce an attention mechanism that utilizes original, inter-level, and intra-level information to intensify the accuracy.
We also propose a novel loss function called heatmap weighting loss, which generates weights for each pixel on the heatmap that makes the model more focused on keypoints.
arXiv Detail & Related papers (2022-05-21T14:26:14Z) - Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN.
Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Cross-modal Knowledge Distillation for Vision-to-Sensor Action
Recognition [12.682984063354748]
This study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework.
In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase.
This framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach.
arXiv Detail & Related papers (2021-10-08T15:06:38Z) - Low-resolution Human Pose Estimation [49.531572116079026]
We propose a novel Confidence-Aware Learning (CAL) method for low-resolution pose estimation.
CAL addresses two fundamental limitations of existing offset learning methods: inconsistent training and testing, decoupled heatmap and offset learning.
Our method outperforms significantly the state-of-the-art methods for low-resolution human pose estimation.
arXiv Detail & Related papers (2021-09-19T09:13:57Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Distillation of neural network models for detection and description of
key points of images [0.0]
The aim of this study is to obtain a more compact model of detection and description of key points.
A new data set has been introduced for testing key point detection methods and a new quality indicator of the allocated key points.
A new model with a significantly smaller number of parameters shows the accuracy of point matching close to the accuracy of the original model.
arXiv Detail & Related papers (2020-05-18T18:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.