Building Blocks for a Complex-Valued Transformer Architecture
- URL: http://arxiv.org/abs/2306.09827v1
- Date: Fri, 16 Jun 2023 13:11:15 GMT
- Title: Building Blocks for a Complex-Valued Transformer Architecture
- Authors: Florian Eilers and Xiaoyi Jiang
- Abstract summary: We aim to make deep learning applicable to complex-valued signals without using projections into $mathbbR2$.
We present multiple versions of a complex-valued Scaled Dot-Product Attention mechanism as well as a complex-valued layer normalization.
We test on a classification and a sequence generation task on the MusicNet dataset and show improved robustness to overfitting while maintaining on-par performance when compared to the real-valued transformer architecture.
- Score: 5.177947445379688
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most deep learning pipelines are built on real-valued operations to deal with
real-valued inputs such as images, speech or music signals. However, a lot of
applications naturally make use of complex-valued signals or images, such as
MRI or remote sensing. Additionally the Fourier transform of signals is
complex-valued and has numerous applications. We aim to make deep learning
directly applicable to these complex-valued signals without using projections
into $\mathbb{R}^2$. Thus we add to the recent developments of complex-valued
neural networks by presenting building blocks to transfer the transformer
architecture to the complex domain. We present multiple versions of a
complex-valued Scaled Dot-Product Attention mechanism as well as a
complex-valued layer normalization. We test on a classification and a sequence
generation task on the MusicNet dataset and show improved robustness to
overfitting while maintaining on-par performance when compared to the
real-valued transformer architecture.
Related papers
- Complex-valued Adaptive System Identification via Low-Rank Tensor
Decomposition [3.268878947476012]
In this work we derive two new architectures to allow the processing of complex-valued signals.
We show that these extensions are able to surpass the trivial, complex-valued extension of the original architecture in terms of performance.
arXiv Detail & Related papers (2023-06-28T07:01:08Z) - Contextual Learning in Fourier Complex Field for VHR Remote Sensing
Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels)
We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA.
By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Multi-Exit Vision Transformer for Dynamic Inference [88.17413955380262]
We propose seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones.
We show that each one of our proposed architectures could prove useful in the trade-off between accuracy and speed.
arXiv Detail & Related papers (2021-06-29T09:01:13Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Modulation Pattern Detection Using Complex Convolutions in Deep Learning [0.0]
Classifying modulation patterns is challenging because noise and channel impairments affect the signals.
We study the implementation and use of complex convolutions in a series of convolutional neural network architectures.
arXiv Detail & Related papers (2020-10-14T02:43:11Z) - Analysis of Deep Complex-Valued Convolutional Neural Networks for MRI
Reconstruction [9.55767753037496]
We investigate end-to-end complex-valued convolutional neural networks for image reconstruction in lieu of two-channel real-valued networks.
We find that complex-valued CNNs with complex-valued convolutions provide superior reconstructions compared to real-valued convolutions with the same number of trainable parameters.
arXiv Detail & Related papers (2020-04-03T19:00:23Z) - Co-VeGAN: Complex-Valued Generative Adversarial Network for Compressive
Sensing MR Image Reconstruction [8.856953486775716]
We propose a novel framework based on a complex-valued adversarial network (Co-VeGAN) to process complex-valued input.
Our model can process complex-valued input, which enables it to perform high-quality reconstruction of the CS-MR images.
arXiv Detail & Related papers (2020-02-24T20:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.