Scale-Equivariant Deep Learning for 3D Data
- URL: http://arxiv.org/abs/2304.05864v1
- Date: Wed, 12 Apr 2023 13:56:12 GMT
- Title: Scale-Equivariant Deep Learning for 3D Data
- Authors: Thomas Wimmer, Vladimir Golkov, Hoai Nam Dang, Moritz Zaiss, Andreas
Maier, Daniel Cremers
- Abstract summary: Convolutional neural networks (CNNs) recognize objects regardless of their position in the image.
We propose a scale-equivariant convolutional network layer for three-dimensional data.
Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariant for 3D medical image analysis.
- Score: 44.52688267348063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of convolutional neural networks (CNNs) to recognize objects
regardless of their position in the image is due to the
translation-equivariance of the convolutional operation. Group-equivariant CNNs
transfer this equivariance to other transformations of the input. Dealing
appropriately with objects and object parts of different scale is challenging,
and scale can vary for multiple reasons such as the underlying object size or
the resolution of the imaging modality. In this paper, we propose a
scale-equivariant convolutional network layer for three-dimensional data that
guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden
of having to learn each possible scale separately, allowing the neural network
to focus on higher-level learning goals, which leads to better results and
better data-efficiency. We provide an overview of the theoretical foundations
and scientific work on scale-equivariant neural networks in the two-dimensional
domain. We then transfer the concepts from 2D to the three-dimensional space
and create a scale-equivariant convolutional layer for 3D data. Using the
proposed scale-equivariant layer, we create a scale-equivariant U-Net for
medical image segmentation and compare it with a non-scale-equivariant baseline
method. Our experiments demonstrate the effectiveness of the proposed method in
achieving scale-equivariance for 3D medical image analysis. We publish our code
at https://github.com/wimmerth/scale-equivariant-3d-convnet for further
research and application.
Related papers
- Improved Generalization of Weight Space Networks via Augmentations [56.571475005291035]
Learning in deep weight spaces (DWS) is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs)
We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets.
To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces.
arXiv Detail & Related papers (2024-02-06T15:34:44Z) - Scale-Equivariant UNet for Histopathology Image Segmentation [1.213915839836187]
Convolutional Neural Networks (CNNs) trained on such images at a given scale fail to generalise to those at different scales.
We propose the Scale-Equivariant UNet (SEUNet) for image segmentation by building on scale-space theory.
arXiv Detail & Related papers (2023-04-10T14:03:08Z) - Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data [2.207533492015563]
We present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics.
These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training.
We demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks.
arXiv Detail & Related papers (2023-03-01T09:27:08Z) - Decomposing 3D Neuroimaging into 2+1D Processing for Schizophrenia
Recognition [25.80846093248797]
We propose to process the 3D data by a 2+1D framework so that we can exploit the powerful deep 2D Convolutional Neural Network (CNN) networks pre-trained on the huge ImageNet dataset for 3D neuroimaging recognition.
Specifically, 3D volumes of Magnetic Resonance Imaging (MRI) metrics are decomposed to 2D slices according to neighboring voxel positions.
Global pooling is applied to remove redundant information as the activation patterns are sparsely distributed over feature maps.
Channel-wise and slice-wise convolutions are proposed to aggregate the contextual information in the third dimension unprocessed by the 2D CNN model.
arXiv Detail & Related papers (2022-11-21T15:22:59Z) - Moving Frame Net: SE(3)-Equivariant Network for Volumes [0.0]
A rotation and translation equivariant neural network for image data was proposed based on the moving frames approach.
We significantly improve that approach by reducing the computation of moving frames to only one, at the input stage.
Our trained model overperforms the benchmarks in the medical volume classification of most of the tested datasets from MedMNIST3D.
arXiv Detail & Related papers (2022-11-07T10:25:38Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose
Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation.
The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.