Related papers: Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation

Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation

URL: http://arxiv.org/abs/2411.02441v5
Date: Fri, 24 Jan 2025 23:14:06 GMT
Title: Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation
Authors: Mehmet Can Yavuz, Yang Yang,
Abstract summary: Cross-D Conv operation bridges the dimensional gap by learning the phase shifting in the Fourier domain.<n>Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning.
Score: 3.69758875412828
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In biomedical imaging analysis, the dichotomy between 2D and 3D data presents a significant challenge. While 3D volumes offer superior real-world applicability, they are less available for each modality and not easy to train in large scale, whereas 2D samples are abundant but less comprehensive. This paper introduces Cross-D Conv operation, a novel approach that bridges the dimensional gap by learning the phase shifting in the Fourier domain. Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning. The proposed architecture leverages the abundance of 2D training data to enhance 3D model performance, offering a practical solution to the multimodal data scarcity challenge in 3D medical model pretraining. Experimental validation on the RadImagenet (2D) and multimodal volumetric sets demonstrates that our approach achieves comparable or superior performance in feature quality assessment. The enhanced convolution operation presents new opportunities for developing efficient classification and segmentation models in medical imaging. This work represents an advancement in cross-dimensional and multimodal medical image analysis, offering a robust framework for utilizing 2D priors in 3D model pretraining while maintaining computational efficiency of 2D training. The code is available on https://github.com/convergedmachine/Cross-D-Conv.

Related papers

Unifying 2D and 3D Vision-Language Understanding [85.84054120018625]
We introduce UniVLG, a unified architecture for 2D and 3D vision-language learning. UniVLG bridges the gap between existing 2D-centric models and the rich 3D sensory data available in embodied systems.
arXiv Detail & Related papers (2025-03-13T17:56:22Z)
Introducing 3D Representation for Medical Image Volume-to-Volume Translation via Score Fusion [3.3559609260669303]
We present Score-Fusion, a novel volumetric translation model that effectively learns 3D representations by ensembling perpendicularly trained 2D diffusion models in score function space. We show that Score-Fusion achieves superior accuracy and volumetric fidelity in 3D medical image super-resolution and modality translation.
arXiv Detail & Related papers (2025-01-13T15:54:21Z)
Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation [21.69523493833432]
We propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model.
arXiv Detail & Related papers (2024-06-18T04:06:02Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)
Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition [108.07591240357306]
We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model. We find out the crux is the less effective training for the ''joint hard samples'', which have high confidence prediction on different wrong labels. Our proposed invariant training strategy, called InvJoint, does not only emphasize the training more on the hard samples, but also seeks the invariance between the conflicting 2D and 3D ambiguous predictions.
arXiv Detail & Related papers (2023-08-18T17:43:12Z)
Spatiotemporal Modeling Encounters 3D Medical Image Analysis: Slice-Shift UNet with Multi-View Fusion [0.0]
We propose a new 2D-based model dubbed Slice SHift UNet which encodes three-dimensional features at 2D CNN's complexity. More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three planes of a volume. The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ axis (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets.
arXiv Detail & Related papers (2023-07-24T14:53:23Z)
Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images. We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z)
Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks. These 2D models have been surpassed by 3D models on 3D computer vision benchmarks. We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z)
Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models [52.529394863331326]
We propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem. Our method is highly effective for 3D medical image reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing MRI, and sparse-view CT.
arXiv Detail & Related papers (2023-03-15T08:28:06Z)
Adapting Pre-trained Vision Transformers from 2D to 3D through Weight Inflation Improves Medical Image Segmentation [19.693778706169752]
We use a weight inflation strategy to adapt pre-trained Transformers from 2D to 3D, retaining the benefit of both transfer learning and depth information. Our approach achieves state-of-the-art performances across a broad range of 3D medical image datasets.
arXiv Detail & Related papers (2023-02-08T19:38:13Z)
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering [31.52291149830299]
Self-supervised learning can overcome the lack of labeled training samples by learning feature representations from unlabeled data. Most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. We propose a novel framework for unsupervised joint learning on 2D and 3D data modalities.
arXiv Detail & Related papers (2022-12-04T18:57:44Z)
RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map. We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z)
Super Images -- A New 2D Perspective on 3D Medical Imaging Analysis [0.0]
We present a simple yet effective 2D method to handle 3D data while efficiently embedding the 3D knowledge during training. Our method generates a super-resolution image by stitching slices side by side in the 3D image. While attaining equal, if not superior, results to 3D networks utilizing only 2D counterparts, the model complexity is reduced by around threefold.
arXiv Detail & Related papers (2022-05-05T09:59:03Z)
Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework. With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity. Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z)
2.75D: Boosting learning by representing 3D Medical imaging to 2D features for small data [54.223614679807994]
3D convolutional neural networks (CNNs) have started to show superior performance to 2D CNNs in numerous deep learning tasks. Applying transfer learning on 3D CNN is challenging due to a lack of publicly available pre-trained 3D models. In this work, we proposed a novel 2D strategical representation of volumetric data, namely 2.75D. As a result, 2D CNN networks can also be used to learn volumetric information.
arXiv Detail & Related papers (2020-02-11T08:24:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.