Related papers: Lamps: Learning Anatomy from Multiple Perspectives via Self-supervision in Chest Radiographs

Lamps: Learning Anatomy from Multiple Perspectives via Self-supervision in Chest Radiographs

URL: http://arxiv.org/abs/2512.22872v2
Date: Fri, 02 Jan 2026 10:52:32 GMT
Title: Lamps: Learning Anatomy from Multiple Perspectives via Self-supervision in Chest Radiographs
Authors: Ziyu Zhou, Haozhe Luo, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Xiaowei Ding, Michael B. Gotway, Jianming Liang,
Abstract summary: We build Lamps (learning anatomy from multiple perspectives via self-supervision) pre-trained on large-scale chest radiographs.<n>Experiments across 10 datasets evaluated through fine-tuning and emergent property analysis demonstrate Lamps' superior robustness, transferability, and clinical potential.
Score: 11.1577654135003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foundation models have been successful in natural language processing and computer vision because they are capable of capturing the underlying structures (foundation) of natural languages. However, in medical imaging, the key foundation lies in human anatomy, as these images directly represent the internal structures of the body, reflecting the consistency, coherence, and hierarchy of human anatomy. Yet, existing self-supervised learning (SSL) methods often overlook these perspectives, limiting their ability to effectively learn anatomical features. To overcome the limitation, we built Lamps (learning anatomy from multiple perspectives via self-supervision) pre-trained on large-scale chest radiographs by harmoniously utilizing the consistency, coherence, and hierarchy of human anatomy as the supervision signal. Extensive experiments across 10 datasets evaluated through fine-tuning and emergent property analysis demonstrate Lamps' superior robustness, transferability, and clinical potential when compared to 10 baseline models. By learning from multiple perspectives, Lamps presents a unique opportunity for foundation models to develop meaningful, robust representations that are aligned with the structure of human anatomy.

Related papers

Human-level 3D shape perception emerges from multi-view learning [63.048728487674815]
We develop a modeling framework that predicts human 3D shape inferences for arbitrary objects.<n>We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data.<n>We find that human-level 3D perception can emerge from a simple, scalable learning objective over naturalistic visual-spatial data.
arXiv Detail & Related papers (2026-02-19T18:56:05Z)
Toward Cognitive Supersensing in Multimodal Large Language Model [67.15559571626747]
We introduce Cognitive Supersensing, a training paradigm that endows MLLMs with human-like visual imagery capabilities.<n>In experiments, MLLMs trained with Cognitive Supersensing significantly outperform state-of-the-art baselines on CogSense-Bench.<n>We will open-source the CogSense-Bench and our model weights.
arXiv Detail & Related papers (2026-02-02T02:19:50Z)
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning [76.98039909663756]
We present CheXWorld, the first effort towards a self-supervised world model for radiographic images.<n>Our work develops a unified framework that simultaneously models three aspects of medical knowledge essential for qualified radiologists.
arXiv Detail & Related papers (2025-04-18T17:50:43Z)
Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation [0.8397730500554048]
We present the first investigation of the power of pre-trained vision-language foundation models, once fine-tuned on medical image datasets, to perform latent disentanglement.<n>We demonstrate that language-guided Stable Diffusion inherently learns to factorize key attributes for image generation.<n>We devise a framework to identify, isolate, and manipulate key attributes through latent space trajectory of generative models, facilitating precise control over medical image synthesis.
arXiv Detail & Related papers (2025-03-30T23:15:52Z)
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision [7.869873154804936]
We introduce Adam-v2, a new self-supervised learning framework extending Adam [79]. Adam-v2 explicitly incorporates part-whole hierarchies into its learning objectives through three key branches. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance.
arXiv Detail & Related papers (2024-04-24T06:02:59Z)
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface. We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z)
Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension. But to achieve these results, LMs must be trained in distinctly un-human-like ways. Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning? We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z)
Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-Supervision [8.84494874768244]
We envision a foundation model for medical imaging that is consciously and purposefully developed upon human anatomy. We devise a novel self-supervised learning (SSL) strategy that exploits the hierarchical nature of human anatomy.
arXiv Detail & Related papers (2023-09-27T01:53:45Z)
Clinically Plausible Pathology-Anatomy Disentanglement in Patient Brain MRI with Structured Variational Priors [11.74918328561702]
We propose a hierarchically structured variational inference model for accurately disentangling observable evidence of disease from subject-specific anatomy in brain MRIs. With flexible, partially autoregressive priors, our model addresses the subtle and fine-grained dependencies that typically exist between anatomical and pathological generating factors of an MRI.
arXiv Detail & Related papers (2022-11-15T00:53:00Z)
Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography [70.37371604119826]
Building AI models with trustworthiness is important especially in regulated areas such as healthcare. Previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions. We propose a feature learning approach using Vision Transformers, which use an attention-based mechanism.
arXiv Detail & Related papers (2022-07-19T14:55:42Z)
LEAP: Learning Articulated Occupancy of People [56.35797895609303]
We introduce LEAP (LEarning Articulated occupancy of People), a novel neural occupancy representation of the human body. Given a set of bone transformations and a query point in space, LEAP first maps the query point to a canonical space via learned linear blend skinning (LBS) functions. LEAP efficiently queries the occupancy value via an occupancy network that models accurate identity- and pose-dependent deformations in the canonical space.
arXiv Detail & Related papers (2021-04-14T13:41:56Z)
Learning Semantics-enriched Representation via Self-discovery, Self-classification, and Self-restoration [12.609383051645887]
We train deep models to learn semantically enriched visual representation by self-discovery, self-classification, and self-restoration of the anatomy underneath medical images. We examine our Semantic Genesis with all the publicly-available pre-trained models, by either self-supervision or fully supervision, on the six distinct target tasks. Our experiments demonstrate that Semantic Genesis significantly exceeds all of its 3D counterparts as well as the de facto ImageNet-based transfer learning in 2D.
arXiv Detail & Related papers (2020-07-14T10:36:10Z)
DeepRetinotopy: Predicting the Functional Organization of Human Visual Cortex from Structural MRI Data using Geometric Deep Learning [125.99533416395765]
We developed a deep learning model capable of exploiting the structure of the cortex to learn the complex relationship between brain function and anatomy from structural and functional MRI data. Our model was able to predict the functional organization of human visual cortex from anatomical properties alone, and it was also able to predict nuanced variations across individuals.
arXiv Detail & Related papers (2020-05-26T04:54:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.