Related papers: MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection

MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection

URL: http://arxiv.org/abs/2511.04255v1
Date: Thu, 06 Nov 2025 10:45:49 GMT
Title: MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection
Authors: Marawan Elbatel, Anbang Wang, Keyuan Liu, Kaouther Mouheb, Enrique Almar-Munoz, Lizhuo Lin, Yanqi Yang, Karim Lekadir, Xiaomeng Li,
Abstract summary: This paper revisits a fundamental yet overlooked baseline: adapting human-centric foundation models for anatomical landmark detection in medical imaging.<n>We investigate the adaptation of Sapiens, a human-centric foundation model designed for pose estimation, to medical imaging through multi-dataset pretraining.<n>Our proposed model, MedSapiens, demonstrates that human-centric foundation models, inherently optimized for spatial pose localization, provide strong priors for anatomical landmark detection.
Score: 9.248236271870558
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper does not introduce a novel architecture; instead, it revisits a fundamental yet overlooked baseline: adapting human-centric foundation models for anatomical landmark detection in medical imaging. While landmark detection has traditionally relied on domain-specific models, the emergence of large-scale pre-trained vision models presents new opportunities. In this study, we investigate the adaptation of Sapiens, a human-centric foundation model designed for pose estimation, to medical imaging through multi-dataset pretraining, establishing a new state of the art across multiple datasets. Our proposed model, MedSapiens, demonstrates that human-centric foundation models, inherently optimized for spatial pose localization, provide strong priors for anatomical landmark detection, yet this potential has remained largely untapped. We benchmark MedSapiens against existing state-of-the-art models, achieving up to 5.26% improvement over generalist models and up to 21.81% improvement over specialist models in the average success detection rate (SDR). To further assess MedSapiens adaptability to novel downstream tasks with few annotations, we evaluate its performance in limited-data settings, achieving 2.69% improvement over the few-shot state of the art in SDR. Code and model weights are available at https://github.com/xmed-lab/MedSapiens .

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities [89.81463562506637]
We introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging.<n>By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures.
arXiv Detail & Related papers (2025-11-25T18:59:53Z)
MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging [3.0332210076508326]
We propose MedDChest, a new foundational Vision Transformer (ViT) model optimized specifically for thoracic imaging.<n>We pre-trained MedDChest from scratch on a massive, curated, multimodal dataset of over 1.2 million images.<n>We validate our model's effectiveness by fine-tuning it on a diverse set of downstream diagnostic tasks.
arXiv Detail & Related papers (2025-11-06T03:28:56Z)
Region-Aware Reconstruction Strategy for Pre-training fMRI Foundation Model [0.7771985426812056]
We introduce an ROI-guided masking strategy to selectively mask semantically coherent brain regions during self-supervised pretraining.<n>We show that our method achieves a 4.23% improvement in classification accuracy for distinguishing healthy controls from individuals diagnosed with ADHD.<n>Our results demonstrate that masking anatomical regions during model pretraining not only enhances interpretability but also yields more robust and discriminative representations.
arXiv Detail & Related papers (2025-11-01T08:12:00Z)
Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics [61.0008867391683]
We present Atlas, a novel vision foundation model based on the RudolfV approach.<n>Our model was trained on a dataset comprising 1.2 million histopathology whole slide images.
arXiv Detail & Related papers (2025-01-09T18:06:45Z)
Synthetic Augmentation for Anatomical Landmark Localization using DDPMs [0.22499166814992436]
diffusion-based generative models have recently started to gain attention for their ability to generate high-quality synthetic images. We propose a novel way to assess the quality of the generated images using a Markov Random Field (MRF) model for landmark matching and a Statistical Shape Model (SSM) to check landmark plausibility.
arXiv Detail & Related papers (2024-10-16T12:09:38Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Certification of Deep Learning Models for Medical Image Segmentation [44.177565298565966]
We present for the first time a certified segmentation baseline for medical imaging based on randomized smoothing and diffusion models. Our results show that leveraging the power of denoising diffusion probabilistic models helps us overcome the limits of randomized smoothing.
arXiv Detail & Related papers (2023-10-05T16:40:33Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications. Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples. Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z)
Anatomy-guided domain adaptation for 3D in-bed human pose estimation [62.3463429269385]
3D human pose estimation is a key component of clinical monitoring systems. We present a novel domain adaptation method, adapting a model from a labeled source to a shifted unlabeled target domain. Our method consistently outperforms various state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2022-11-22T11:34:51Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.