Related papers: Models Genesis

Models Genesis

URL: http://arxiv.org/abs/2004.07882v4
Date: Wed, 16 Dec 2020 19:58:08 GMT
Title: Models Genesis
Authors: Zongwei Zhou, Vatsal Sodha, Jiaxuan Pang, Michael B. Gotway, Jianming Liang
Abstract summary: Transfer learning from natural images to medical images has been established as one of the most practical paradigms in deep learning for medical image analysis. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis. Our experiments demonstrate that our Models Genesis significantly outperform learning from scratch and existing pre-trained 3D models in all five target 3D applications.
Score: 10.929445262793116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer learning from natural images to medical images has been established as one of the most practical paradigms in deep learning for medical image analysis. To fit this paradigm, however, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information, thereby inevitably compromising its performance. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis, because they are created ex nihilo (with no manual labeling), self-taught (learnt by self-supervision), and generic (served as source models for generating application-specific target models). Our extensive experiments demonstrate that our Models Genesis significantly outperform learning from scratch and existing pre-trained 3D models in all five target 3D applications covering both segmentation and classification. More importantly, learning a model from scratch simply in 3D may not necessarily yield performance better than transfer learning from ImageNet in 2D, but our Models Genesis consistently top any 2D/2.5D approaches including fine-tuning the models pre-trained from ImageNet as well as fine-tuning the 2D versions of our Models Genesis, confirming the importance of 3D anatomical information and significance of Models Genesis for 3D medical imaging. This performance is attributed to our unified self-supervised learning framework, built on a simple yet powerful observation: the sophisticated and recurrent anatomy in medical images can serve as strong yet free supervision signals for deep models to learn common anatomical representation automatically via self-supervision. As open science, all codes and pre-trained Models Genesis are available at https://github.com/MrGiovanni/ModelsGenesis.

Related papers

RadSAM: Segmenting 3D radiological images with a 2D promptable model [4.9000940389224885]
We propose RadSAM, a novel method for segmenting 3D objects with a 2D model from a single prompt. We introduce a benchmark to evaluate the model's ability to segment 3D objects in CT images from a single prompt.
arXiv Detail & Related papers (2025-04-29T15:00:25Z)
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks? [8.82276658079814]
We develop a suite of models that are pre-trained on our AbdomenAtlas 1.1 for transfer learning. Our preliminary analyses indicate that the model trained only with 21 CT volumes, 672 masks, and 40 GPU hours has a transfer learning ability similar to the model trained with 5,050 (unlabeled) CT volumes and 1,152 GPU hours.
arXiv Detail & Related papers (2025-01-20T03:34:49Z)
Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation [66.75243908044538]
We introduce Zero-1-to-G, a novel approach to direct 3D generation on Gaussian splats using pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats. This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects.
arXiv Detail & Related papers (2025-01-09T18:37:35Z)
Deep Convolutional Neural Networks on Multiclass Classification of Three-Dimensional Brain Images for Parkinson's Disease Stage Prediction [2.931680194227131]
We developed a model capable of accurately predicting Parkinson's disease stages. We used the entire three-dimensional (3D) brain images as input. We incorporated an attention mechanism to account for the varying importance of different slices in the prediction process.
arXiv Detail & Related papers (2024-10-31T05:40:08Z)
3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models. For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts. We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z)
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging [18.111368889931885]
We present VISTA3D, Versatile Imaging SegmenTation and voxel model. It is built on top of the well-established 3D segmentation pipeline. It is the first model to achieve state-of-the-art performance in both 3D automatic (supporting 127 classes) and 3D interactive segmentation.
arXiv Detail & Related papers (2024-06-07T22:41:39Z)
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task. Our approach involves making initial predictions of 2D semantic masks using different large vision models. To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision. In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z)
AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z)
HoloDiffusion: Training a 3D Diffusion Model using 2D Images [71.1144397510333]
We introduce a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision. We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
arXiv Detail & Related papers (2023-03-29T07:35:56Z)
Oral-3Dv2: 3D Oral Reconstruction from Panoramic X-Ray Imaging with Implicit Neural Representation [3.8215162658168524]
Oral-3Dv2 is a non-adversarial-learning-based model in 3D radiology reconstruction from a single panoramic X-ray image. Our model learns to represent the 3D oral structure in an implicit way by mapping 2D coordinates into density values of voxels in the 3D space. To the best of our knowledge, this is the first work of a non-adversarial-learning-based model in 3D radiology reconstruction from a single panoramic X-ray image.
arXiv Detail & Related papers (2023-03-21T18:17:27Z)
Transferring Models Trained on Natural Images to 3D MRI via Position Encoded Slice Models [14.42534860640976]
2D-Slice-CNN architecture embeds all the MRI slices with 2D encoders that take 2D image input and combines them via permutation-invariant layers. With the insight that pretrained model can serve as the 2D encoder, we initialize the 2D encoder with ImageNet pretrained weights that outperform those and trained from scratch on two neuroimaging tasks.
arXiv Detail & Related papers (2023-03-02T18:52:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.