Related papers: Fourier Image Transformer

Fourier Image Transformer

URL: http://arxiv.org/abs/2104.02555v1
Date: Tue, 6 Apr 2021 14:48:57 GMT
Title: Fourier Image Transformer
Authors: Tim-Oliver Buchholz and Florian Jug
Abstract summary: We show that an auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution input. We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction.
Score: 10.315102237565734
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as image completion or image classification. Here we propose to use a sequential image representation, where each prefix of the complete sequence describes the whole image at reduced resolution. Using such Fourier Domain Encodings (FDEs), an auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution input. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a set of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction. In summary, we show that Fourier Image Transformer (FIT) can be used to solve relevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures.

Related papers

A Fourier Transform Framework for Domain Adaptation [8.997055928719515]
unsupervised domain adaptation (UDA) can transfer knowledge from a label-rich source domain to a target domain that lacks labels. Many existing UDA algorithms suffer from directly using raw images as input. We employ the Fourier method (FTF) to incorporate low-level information from the target domain into the source domain.
arXiv Detail & Related papers (2024-03-12T16:35:32Z)
Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution. We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain. Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z)
Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration [62.53130123397081]
U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields. We first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. We then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path.
arXiv Detail & Related papers (2023-07-06T13:57:12Z)
Fourier-Net: Fast Image Registration with Band-limited Deformation [16.894559169947055]
Unsupervised image registration commonly adopts U-Net style networks to predict dense displacement fields in the full-resolution spatial domain. We propose the Fourier-Net, replacing the expansive path in a U-Net style network with a parameter-free model-driven decoder.
arXiv Detail & Related papers (2022-11-29T16:24:06Z)
Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels) We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA. By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z)
Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property. We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z)
Seeing Implicit Neural Representations as Fourier Series [13.216389226310987]
Implicit Neural Representations (INR) use multilayer perceptrons to represent high-frequency functions in low-dimensional problem domains. These representations achieved state-of-the-art results on tasks related to complex 3D objects and scenes. This work analyzes the connection between the two methods and shows that a Fourier mapped perceptron is structurally like one hidden layer SIREN.
arXiv Detail & Related papers (2021-09-01T08:40:20Z)
Learnable Fourier Features for Multi-DimensionalSpatial Positional Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features. Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z)
Transformer-Based Deep Image Matching for Generalizable Person Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.