HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and
Parameter-Efficient 3D Image Segmentation
- URL: http://arxiv.org/abs/2310.04466v1
- Date: Thu, 5 Oct 2023 18:44:41 GMT
- Title: HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and
Parameter-Efficient 3D Image Segmentation
- Authors: Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood
- Abstract summary: We introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention.
We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude.
When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.
- Score: 4.48473804240016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the introduction of Transformers, different attention-based models have
been proposed for image segmentation with promising results. Although
self-attention allows capturing of long-range dependencies, it suffers from a
quadratic complexity in the image size especially in 3D. To avoid the
out-of-memory error during training, input size reduction is usually required
for 3D segmentation, but the accuracy can be suboptimal when the trained models
are applied on the original image size. To address this limitation, inspired by
the Fourier neural operator (FNO), we introduce the HartleyMHA model which is
robust to training image resolution with efficient self-attention. FNO is a
deep learning framework for learning mappings between functions in partial
differential equations, which has the appealing properties of zero-shot
super-resolution and global receptive field. We modify the FNO by using the
Hartley transform with shared parameters to reduce the model size by orders of
magnitude, and this allows us to further apply self-attention in the frequency
domain for more expressive high-order feature combination with improved
efficiency. When tested on the BraTS'19 dataset, it achieved superior
robustness to training image resolution than other tested models with less than
1% of their model parameters.
Related papers
- 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions.
We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression.
Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z) - Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation [36.45488536471859]
Similarity refines the image-image similarity by using unlabeled images.
Weight introduces a precision matrix into the weight function to adequately model the relation between training samples.
To reduce the high complexity of GPs, we propose a group-based learning strategy.
arXiv Detail & Related papers (2024-10-11T15:12:30Z) - LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - FNOSeg3D: Resolution-Robust 3D Image Segmentation with Fourier Neural
Operator [4.48473804240016]
We introduce FNOSeg3D, a 3D segmentation model robust to training image resolution based on the Fourier neural operator (FNO)
When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.
arXiv Detail & Related papers (2023-10-05T19:58:36Z) - Generative Multiplane Neural Radiance for 3D-Aware Image Generation [102.15322193381617]
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
Our GMNR model generates 3D-aware images of 1024 X 1024 pixels with 17.6 FPS on a single V100.
arXiv Detail & Related papers (2023-04-03T17:41:20Z) - Super-Resolution Based Patch-Free 3D Image Segmentation with
High-Frequency Guidance [20.86089285980103]
High resolution (HR) 3D images are widely used nowadays, such as medical images like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT)
arXiv Detail & Related papers (2022-10-26T11:46:08Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Deep Learning for Regularization Prediction in Diffeomorphic Image
Registration [8.781861951759948]
We introduce a novel framework that automatically determines the parameters controlling the smoothness of diffeomorphic transformations.
We develop a predictive model based on deep convolutional neural networks (CNN) that learns the mapping between pairwise images and the regularization parameter of image registration.
Experimental results show that our model not only predicts appropriate regularization parameters for image registration, but also improving the network training in terms of time and memory efficiency.
arXiv Detail & Related papers (2020-11-28T22:56:44Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.