Related papers: New Approaches to Long Document Summarization: Fourier Transform Based Attention in a Transformer Model

Related papers

Fourier Basis Mapping: A Time-Frequency Learning Framework for Time Series Forecasting [25.304812011127257]
We introduce a novel method for integrating time-frequency features through Fourier basis expansion and mapping in the time-frequency space.<n>Our approach extracts explicit frequency features while preserving temporal characteristics.<n>The results are validated on diverse real-world datasets for both long-term and short-term forecasting tasks.
arXiv Detail & Related papers (2025-07-13T01:45:27Z)
Multimodal Fusion SLAM with Fourier Attention [15.2253217769593]
We propose FMF-SLAM, an efficient multimodal fusion SLAM method that utilizes fast Fourier transform (FFT) to enhance the algorithm efficiency.<n>Specifically, we introduce a novel Fourier-based self-attention and cross-attention mechanism to extract features from RGB and depth signals.<n>Our approach is validated using video sequences from TUM, TartanAir, and our real-world datasets, showcasing state-of-the-art performance under noisy, varying lighting, and dark conditions.
arXiv Detail & Related papers (2025-06-22T23:44:07Z)
Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention [0.0]
Learnable Multi-Scale Wavelet Transformer (LMWT) is a novel architecture that replaces the standard dot-product self-attention. We present the detailed mathematical formulation of the learnable Haar wavelet module and its integration into the transformer framework. Our results indicate that the LMWT achieves competitive performance while offering substantial computational advantages.
arXiv Detail & Related papers (2025-04-08T22:16:54Z)
FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation [0.0]
We present a novel non-autoregressive Transformer (NAT) architecture that employs Fourier-based mixing in the decoder to generate output sequences in parallel.
arXiv Detail & Related papers (2025-03-04T19:08:39Z)
FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting [17.738942892605234]
This paper presents textbfFreEformer, a simple yet effective model that leverages a textbfFrequency textbfEnhanced Transtextbfformer. Experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks.
arXiv Detail & Related papers (2025-01-23T08:53:45Z)
Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis [9.969451740838418]
We introduce Neural Fourier Modelling (NFM), a compact yet powerful solution for time-series analysis. NFM is grounded in two key properties of the Fourier transform (FT): (i) the ability to model finite-length time series as functions in the Fourier domain, and (ii) the capacity for data manipulation within the Fourier domain. NFM achieves state-of-the-art performance on a wide range of tasks, including challenging time-series scenarios with previously unseen sampling rates at test time.
arXiv Detail & Related papers (2024-10-07T02:39:55Z)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately. Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail. Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z)
A Dual Domain Multi-exposure Image Fusion Network based on the Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures. We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI. Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z)
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions. The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution [32.29219284419944]
Cross-refinement adaptive feature modulation transformer (CRAFT) We introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency. Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods.
arXiv Detail & Related papers (2023-08-09T15:38:36Z)
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property. We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z)
Revisiting Transformer-based Models for Long Document Classification [31.60414185940218]
In real-world applications, multi-page multi-paragraph documents are common and cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.
arXiv Detail & Related papers (2022-04-14T00:44:36Z)
FNetAR: Mixing Tokens with Autoregressive Fourier Transforms [0.0]
We show that FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modeling. The autoregressive Fourier transform could likely be used for parameter on most Transformer-based time-series prediction models.
arXiv Detail & Related papers (2021-07-22T21:24:02Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE) Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)
A Fourier-based Framework for Domain Generalization [82.54650565298418]
Domain generalization aims at tackling this problem by learning transferable knowledge from multiple source domains in order to generalize to unseen target domains. This paper introduces a novel Fourier-based perspective for domain generalization. Experiments on three benchmarks have demonstrated that the proposed method is able to achieve state-of-the-arts performance for domain generalization.
arXiv Detail & Related papers (2021-05-24T06:50:30Z)
Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation. tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z)
FNet: Mixing Tokens with Fourier Transforms [0.578717214982749]
We show that Transformer encoder architectures can be massively sped up with limited accuracy costs. We replace the self-attention sublayers with simple linear transformations that "mix" input tokens. The resulting model, which we name FNet, scales very efficiently to long inputs.
arXiv Detail & Related papers (2021-05-09T03:32:48Z)
Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD) It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR) SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.