Comparison of marker-less 2D image-based methods for infant pose estimation
- URL: http://arxiv.org/abs/2410.04980v1
- Date: Mon, 7 Oct 2024 12:21:49 GMT
- Title: Comparison of marker-less 2D image-based methods for infant pose estimation
- Authors: Lennart Jahn, Sarah Flügge, Dajie Zhang, Luise Poustka, Sven Bölte, Florentin Wörgötter, Peter B Marschik, Tomas Kulvicius,
- Abstract summary: The General Movement Assessment (GMA) is a video-based tool to classify infant motor functioning.
We compare the performance of available generic- and infant-pose estimators, and the choice of viewing angle for optimal recordings.
The results show that the best performing generic model trained on adults, ViTPose, also performs best on infants.
- Score: 2.7726930707973048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are increasing efforts to automate clinical methods for early diagnosis of developmental disorders, among them the General Movement Assessment (GMA), a video-based tool to classify infant motor functioning. Optimal pose estimation is a crucial part of the automated GMA. In this study we compare the performance of available generic- and infant-pose estimators, and the choice of viewing angle for optimal recordings, i.e., conventional diagonal view used in GMA vs. top-down view. For this study, we used 4500 annotated video-frames from 75 recordings of infant spontaneous motor functions from 4 to 26 weeks. To determine which available pose estimation method and camera angle yield the best pose estimation accuracy on infants in a GMA related setting, the distance to human annotations as well as the percentage of correct key-points (PCK) were computed and compared. The results show that the best performing generic model trained on adults, ViTPose, also performs best on infants. We see no improvement from using specialized infant-pose estimators over the generic pose estimators on our own infant dataset. However, when retraining a generic model on our data, there is a significant improvement in pose estimation accuracy. The pose estimation accuracy obtained from the top-down view is significantly better than that obtained from the diagonal view, especially for the detection of the hip key-points. The results also indicate only limited generalization capabilities of infant-pose estimators to other infant datasets, which hints that one should be careful when choosing infant pose estimators and using them on infant datasets which they were not trained on. While the standard GMA method uses a diagonal view for assessment, pose estimation accuracy significantly improves using a top-down view. This suggests that a top-down view should be included in recording setups for automated GMA research.
Related papers
- Advancing Newborn Care: Precise Birth Time Detection Using AI-Driven Thermal Imaging with Adaptive Normalization [1.101731711817642]
We investigate the fusion of Artificial Intelligence (AI) and thermal imaging to develop the first AI-driven Time of Birth detector.
Our methodology involves a three-step process: first, we propose an adaptive normalization method based on Gaussian mixture models (GMM) to mitigate issues related to temperature variations.
A precision of 88.1% and a recall of 89.3% are reported in the detection of the newborn within thermal frames during performance evaluation.
arXiv Detail & Related papers (2024-10-14T13:20:51Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint
Detection and Invariant Description for Endoscopy [83.4885991036141]
RIDE is a learning-based method for rotation-equivariant detection and invariant description.
It is trained in a self-supervised manner on a large curation of endoscopic images.
It sets a new state-of-the-art performance on matching and relative pose estimation tasks.
arXiv Detail & Related papers (2023-09-18T08:16:30Z) - An Improved Model Ensembled of Different Hyper-parameter Tuned Machine
Learning Algorithms for Fetal Health Prediction [1.332560004325655]
We propose a robust ensemble model called ensemble of tuned Support Vector Machine and ExtraTrees for predicting fetal health.
Our proposed ETSE model outperformed the other models with 100% precision, 100% recall, 100% F1-score, and 99.66% accuracy.
arXiv Detail & Related papers (2023-05-26T16:40:44Z) - Localizing Scan Targets from Human Pose for Autonomous Lung Ultrasound
Imaging [61.60067283680348]
With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging.
We propose a vision-based, data driven method that incorporates learning-based computer vision techniques.
Our method attains an accuracy level of 15.52 (9.47) mm for probe positioning and 4.32 (3.69)deg for probe orientation, with a success rate above 80% under an error threshold of 25mm for all scan targets.
arXiv Detail & Related papers (2022-12-15T14:34:12Z) - Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons.
Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA)
For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z) - Automated Classification of General Movements in Infants Using a
Two-stream Spatiotemporal Fusion Network [5.541644538483947]
The assessment of general movements (GMs) in infants is a useful tool in the early diagnosis of neurodevelopmental disorders.
Recent video-based GMs classification has attracted attention, but this approach would be strongly affected by irrelevant information.
We propose an automated GMs classification method, which consists of preprocessing networks that remove unnecessary background information.
arXiv Detail & Related papers (2022-07-04T05:21:09Z) - AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation [6.9000851935487075]
We propose infant pose dataset and Deep Aggregation Vision Transformer for human pose estimation.
AggPose is a fast trained full transformer framework without using convolution operations to extract features in the early stages.
We show that AggPose could effectively learn the multi-scale features among different resolutions and significantly improve the performance of infant pose estimation.
arXiv Detail & Related papers (2022-05-11T05:34:14Z) - Enabling faster and more reliable sonographic assessment of gestational
age through machine learning [1.3238745915345225]
Fetal ultrasounds are an essential part of prenatal care and can be used to estimate gestational age (GA)
We developed three AI models: an image model using standard plane images, a video model using fly-to videos, and an ensemble model (combining both image and video)
All three were statistically superior to standard fetal biometry-based GA estimates derived by expert sonographers.
arXiv Detail & Related papers (2022-03-22T17:15:56Z) - Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
Prenatal Ultrasound Volumes [52.53375964591765]
We propose the first fully-automated solution to segment the whole fetal head in US volumes.
The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture.
We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features.
arXiv Detail & Related papers (2020-04-28T14:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.