Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis
- URL: http://arxiv.org/abs/2503.09860v1
- Date: Wed, 12 Mar 2025 21:45:13 GMT
- Title: Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis
- Authors: Nahid Ul Islam, DongAo Ma, Jiaxuan Pang, Shivasakthi Senthil Velan, Michael Gotway, Jianming Liang,
- Abstract summary: nFoundation X is an end-to-end framework that utilizes diverse expert-level annotations to train a foundation model.<n>We trained a model using 11 chest X-ray datasets, covering annotations for classification, localization, and segmentation tasks.<n>Our experimental results show that Foundation X achieves notable performance gains through extensive annotation utilization.
- Score: 3.874753046352665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing robust and versatile deep-learning models is essential for enhancing diagnostic accuracy and guiding clinical interventions in medical imaging, but it requires a large amount of annotated data. The advancement of deep learning has facilitated the creation of numerous medical datasets with diverse expert-level annotations. Aggregating these datasets can maximize data utilization and address the inadequacy of labeled data. However, the heterogeneity of expert-level annotations across tasks such as classification, localization, and segmentation presents a significant challenge for learning from these datasets. To this end, we introduce nFoundation X, an end-to-end framework that utilizes diverse expert-level annotations from numerous public datasets to train a foundation model capable of multiple tasks including classification, localization, and segmentation. To address the challenges of annotation and task heterogeneity, we propose a Lock-Release pretraining strategy to enhance the cyclic learning from multiple datasets, combined with the student-teacher learning paradigm, ensuring the model retains general knowledge for all tasks while preventing overfitting to any single task. To demonstrate the effectiveness of Foundation X, we trained a model using 11 chest X-ray datasets, covering annotations for classification, localization, and segmentation tasks. Our experimental results show that Foundation X achieves notable performance gains through extensive annotation utilization, excels in cross-dataset and cross-task learning, and further enhances performance in organ localization and segmentation tasks. All code and pretrained models are publicly accessible at https://github.com/jlianglab/Foundation_X.
Related papers
- Learning from Neighbors: Category Extrapolation for Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.<n>We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.<n>To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation [50.407071700154674]
We propose to formulate annotation-efficient nucleus instance segmentation from the perspective of few-shot learning (FSL)
Our work was motivated by that, with the prosperity of computational pathology, an increasing number of fully-annotated datasets are publicly accessible.
Extensive experiments on a couple of publicly accessible datasets demonstrate that SGFSIS can outperform other annotation-efficient learning baselines.
arXiv Detail & Related papers (2024-02-26T03:49:18Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Segment Together: A Versatile Paradigm for Semi-Supervised Medical Image
Segmentation [17.69933345468061]
scarcity has become a major obstacle for training powerful deep-learning models for medical image segmentation.
We introduce a textbfVersatile textbfSemi-supervised framework to exploit more unlabeled data for semi-supervised medical image segmentation.
arXiv Detail & Related papers (2023-11-20T11:35:52Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation [1.146419670457951]
Current practices limit model training and supervised pre-training to one or a few similar datasets.
We propose MultiTalent, a method that leverages multiple CT datasets with diverse and conflicting class definitions.
arXiv Detail & Related papers (2023-03-25T11:37:16Z) - Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning [10.57079240576682]
We introduce a novel Open-Set Self-Supervised Learning problem under the assumption that a large-scale unlabeled open-set is available.
In our problem setup, it is crucial to consider the distribution mismatch between the open-set and target dataset.
We demonstrate that SimCore significantly improves representation learning performance through extensive experimental settings.
arXiv Detail & Related papers (2023-03-20T13:38:29Z) - Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - X-Learner: Learning Cross Sources and Tasks for Universal Visual
Representation [71.51719469058666]
We propose a representation learning framework called X-Learner.
X-Learner learns the universal feature of multiple vision tasks supervised by various sources.
X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z) - Hierarchical Self-Supervised Learning for Medical Image Segmentation
Based on Multi-Domain Data Aggregation [23.616336382437275]
We propose Hierarchical Self-Supervised Learning (HSSL) for medical image segmentation.
We first aggregate a dataset from several medical challenges, then pre-train the network in a self-supervised manner, and finally fine-tune on labeled data.
Compared to learning from scratch, our new method yields better performance on various tasks.
arXiv Detail & Related papers (2021-07-10T18:17:57Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z) - Learning Cross-domain Generalizable Features by Representation
Disentanglement [11.74643883335152]
Deep learning models exhibit limited generalizability across different domains.
We propose Mutual-Information-based Disentangled Neural Networks (MIDNet) to extract generalizable features that enable transferring knowledge to unseen categorical features in target domains.
We demonstrate our method on handwritten digits datasets and a fetal ultrasound dataset for image classification tasks.
arXiv Detail & Related papers (2020-02-29T17:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.