Fourier Image Transformer
- URL: http://arxiv.org/abs/2104.02555v1
- Date: Tue, 6 Apr 2021 14:48:57 GMT
- Title: Fourier Image Transformer
- Authors: Tim-Oliver Buchholz and Florian Jug
- Abstract summary: We show that an auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution input.
We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction.
- Score: 10.315102237565734
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Transformer architectures show spectacular performance on NLP tasks and have
recently also been used for tasks such as image completion or image
classification. Here we propose to use a sequential image representation, where
each prefix of the complete sequence describes the whole image at reduced
resolution. Using such Fourier Domain Encodings (FDEs), an auto-regressive
image completion task is equivalent to predicting a higher resolution output
given a low-resolution input. Additionally, we show that an encoder-decoder
setup can be used to query arbitrary Fourier coefficients given a set of
Fourier domain observations. We demonstrate the practicality of this approach
in the context of computed tomography (CT) image reconstruction. In summary, we
show that Fourier Image Transformer (FIT) can be used to solve relevant image
analysis tasks in Fourier space, a domain inherently inaccessible to
convolutional architectures.
Related papers
- A Fourier Transform Framework for Domain Adaptation [8.997055928719515]
unsupervised domain adaptation (UDA) can transfer knowledge from a label-rich source domain to a target domain that lacks labels.
Many existing UDA algorithms suffer from directly using raw images as input.
We employ the Fourier method (FTF) to incorporate low-level information from the target domain into the source domain.
arXiv Detail & Related papers (2024-03-12T16:35:32Z) - Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D
Medical Image Registration [62.53130123397081]
U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields.
We first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder.
We then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path.
arXiv Detail & Related papers (2023-07-06T13:57:12Z) - Fourier-Net: Fast Image Registration with Band-limited Deformation [16.894559169947055]
Unsupervised image registration commonly adopts U-Net style networks to predict dense displacement fields in the full-resolution spatial domain.
We propose the Fourier-Net, replacing the expansive path in a U-Net style network with a parameter-free model-driven decoder.
arXiv Detail & Related papers (2022-11-29T16:24:06Z) - Contextual Learning in Fourier Complex Field for VHR Remote Sensing
Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels)
We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA.
By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z) - Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property.
We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z) - Seeing Implicit Neural Representations as Fourier Series [13.216389226310987]
Implicit Neural Representations (INR) use multilayer perceptrons to represent high-frequency functions in low-dimensional problem domains.
These representations achieved state-of-the-art results on tasks related to complex 3D objects and scenes.
This work analyzes the connection between the two methods and shows that a Fourier mapped perceptron is structurally like one hidden layer SIREN.
arXiv Detail & Related papers (2021-09-01T08:40:20Z) - Learnable Fourier Features for Multi-DimensionalSpatial Positional
Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features.
Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.