Joint Self-Supervised Image-Volume Representation Learning with
Intra-Inter Contrastive Clustering
- URL: http://arxiv.org/abs/2212.01893v1
- Date: Sun, 4 Dec 2022 18:57:44 GMT
- Title: Joint Self-Supervised Image-Volume Representation Learning with
Intra-Inter Contrastive Clustering
- Authors: Duy M. H. Nguyen, Hoang Nguyen, Mai T. N. Truong, Tri Cao, Binh T.
Nguyen, Nhat Ho, Paul Swoboda, Shadi Albarqouni, Pengtao Xie, Daniel Sonntag
- Abstract summary: Self-supervised learning can overcome the lack of labeled training samples by learning feature representations from unlabeled data.
Most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes.
We propose a novel framework for unsupervised joint learning on 2D and 3D data modalities.
- Score: 31.52291149830299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collecting large-scale medical datasets with fully annotated samples for
training of deep networks is prohibitively expensive, especially for 3D volume
data. Recent breakthroughs in self-supervised learning (SSL) offer the ability
to overcome the lack of labeled training samples by learning feature
representations from unlabeled data. However, most current SSL techniques in
the medical field have been designed for either 2D images or 3D volumes. In
practice, this restricts the capability to fully leverage unlabeled data from
numerous sources, which may include both 2D and 3D data. Additionally, the use
of these pre-trained networks is constrained to downstream tasks with
compatible data dimensions. In this paper, we propose a novel framework for
unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D
images or 2D slices extracted from 3D volumes, we construct an SSL task based
on a 2D contrastive clustering problem for distinct classes. The 3D volumes are
exploited by computing vectored embedding at each slice and then assembling a
holistic feature through deformable self-attention mechanisms in Transformer,
allowing incorporating long-range dependencies between slices inside 3D
volumes. These holistic features are further utilized to define a novel 3D
clustering agreement-based SSL task and masking embedding prediction inspired
by pre-trained language models. Experiments on downstream tasks, such as 3D
brain segmentation, lung nodule detection, 3D heart structures segmentation,
and abnormal chest X-ray detection, demonstrate the effectiveness of our joint
2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a
significant margin and also surpass various modern 2D and 3D SSL approaches.
Related papers
- GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation [3.69758875412828]
Cross-D Conv operation bridges the dimensional gap by learning the phase shifting in the Fourier domain.
Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning.
arXiv Detail & Related papers (2024-11-02T13:03:44Z) - Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation [68.60747298865394]
We propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D)
Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data.
This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis.
arXiv Detail & Related papers (2024-06-03T02:57:25Z) - Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D
Networks for 3D Coherent Layer Segmentation of Retinal OCT Images with Full
and Sparse Annotations [32.69359482975795]
This work presents a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) to obtain continuous 3D retinal layer surfaces from OCT volumes.
Experiments on a synthetic dataset and three public clinical datasets show that our framework can effectively align the B-scans for potential motion correction.
arXiv Detail & Related papers (2023-12-04T08:32:31Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic
Segmentation [82.47872784972861]
Cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain.
This paper studies a new 3DLSS setting where a 2D dataset with semantic annotations and a paired but unannotated 2D image and 3D LiDAR data (target) are available.
To achieve 3DLSS in this scenario, we propose Cross-Modal and Cross-Domain Learning (CoMoDaL)
arXiv Detail & Related papers (2023-08-05T14:00:05Z) - Self-supervised learning via inter-modal reconstruction and feature
projection networks for label-efficient 3D-to-2D segmentation [4.5206601127476445]
We propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation.
Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score.
arXiv Detail & Related papers (2023-07-06T14:16:25Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - Super Images -- A New 2D Perspective on 3D Medical Imaging Analysis [0.0]
We present a simple yet effective 2D method to handle 3D data while efficiently embedding the 3D knowledge during training.
Our method generates a super-resolution image by stitching slices side by side in the 3D image.
While attaining equal, if not superior, results to 3D networks utilizing only 2D counterparts, the model complexity is reduced by around threefold.
arXiv Detail & Related papers (2022-05-05T09:59:03Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.