DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology
- URL: http://arxiv.org/abs/2404.05022v1
- Date: Sun, 7 Apr 2024 17:25:52 GMT
- Title: DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology
- Authors: Valentin Koch, Sophia J. Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia Schnabel, Tingying Peng, Carsten Marr,
- Abstract summary: We introduce DinoBloom, the first foundation model for single cell images in hematology.
Our model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears.
A family of four DinoBloom models can be adapted for a wide range of downstream applications.
- Score: 1.3551232282678036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer learning from natural images. To address these challenges, we introduce DinoBloom, the first foundation model for single cell images in hematology, utilizing a tailored DINOv2 pipeline. Our model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears, the most substantial open-source cohort in hematology so far, comprising over 380,000 white blood cell images. To assess its generalization capability, we evaluate it on an external dataset with a challenging domain shift. We show that our model outperforms existing medical and non-medical vision models in (i) linear probing and k-nearest neighbor evaluations for cell-type classification on blood and bone marrow smears and (ii) weakly supervised multiple instance learning for acute myeloid leukemia subtyping by a large margin. A family of four DinoBloom models (small, base, large, and giant) can be adapted for a wide range of downstream applications, be a strong baseline for classification problems, and facilitate the assessment of batch effects in new datasets. All models are available at github.com/marrlab/DinoBloom.
Related papers
- Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology [10.574005822664034]
This study is the largest-scale evaluation of its kind to date. To our knowledge, this is the largest-scale evaluation of its kind to date.
Among the evaluated models, CellViT demonstrated superior performance in segmenting nuclei in kidney pathology.
However, none of the foundation models are perfect; a performance gap remains in general nuclei segmentation for kidney pathology.
arXiv Detail & Related papers (2024-08-09T22:34:13Z) - Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks [11.749248917866915]
We propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types.
We pretrained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients.
The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%.
arXiv Detail & Related papers (2024-07-11T16:03:59Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - A Continual Learning Approach for Cross-Domain White Blood Cell
Classification [36.482007703764154]
We propose a rehearsal-based continual learning approach for class incremental and domain incremental scenarios in white blood cell classification.
To choose representative samples from previous tasks, we employ set selection based on the model's predictions.
We thoroughly evaluated our proposed approach on three white blood cell classification datasets that differ in color, resolution, and class composition.
arXiv Detail & Related papers (2023-08-24T09:38:54Z) - Deep CNNs for Peripheral Blood Cell Classification [0.0]
We benchmark 27 popular deep convolutional neural network architectures on the microscopic peripheral blood cell images dataset.
We fine-tune the state-of-the-art image classification models pre-trained on the ImageNet dataset for blood cell classification.
arXiv Detail & Related papers (2021-10-18T17:56:07Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - Relational Subsets Knowledge Distillation for Long-tailed Retinal
Diseases Recognition [65.77962788209103]
We propose class subset learning by dividing the long-tailed data into multiple class subsets according to prior knowledge.
It enforces the model to focus on learning the subset-specific knowledge.
The proposed framework proved to be effective for the long-tailed retinal diseases recognition task.
arXiv Detail & Related papers (2021-04-22T13:39:33Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion
Classification [5.642359877598896]
This paper proposes a novel data augmentation strategy for single model classification of skin lesions based on a small and imbalanced dataset.
Various DCNNs are trained on this dataset to show that the models with moderate complexity outperform the larger models.
By combining Modified RandAugment and Multi-weighted Focal Loss in a single DCNN model, we have achieved the classification accuracy comparable to those of multiple ensembling models on the ISIC 2018 challenge test dataset.
arXiv Detail & Related papers (2021-02-02T03:48:55Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.