Related papers: WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

URL: http://arxiv.org/abs/2601.08602v1
Date: Tue, 13 Jan 2026 14:47:22 GMT
Title: WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation
Authors: Zishan Shu, Juntong Wu, Wei Yan, Xudong Liu, Hongyu Zhang, Chang Liu, Youdong Mao, Jie Chen,
Abstract summary: Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially.<n>We revisit this problem from a wave-based perspective, treating feature maps as spatial signals whose evolution over an internal propagation time is governed by an underdamped wave equation.<n>We propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation.
Score: 24.13944601660532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped wave equation. In this formulation, spatial frequency-from low-frequency global layout to high-frequency edges and textures-is modeled explicitly, and its interaction with propagation time is controlled rather than implicitly fixed. We derive a closed-form, frequency-time decoupled solution and implement it as the Wave Propagation Operator (WPO), a lightweight module that models global interactions in O(N log N) time-far lower than attention. Building on WPO, we propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation, while delivering up to 1.6x higher throughput and 30% fewer FLOPs than attention-based alternatives. Furthermore, our results demonstrate that wave propagation introduces a complementary modeling bias to heat-based methods, effectively capturing both global coherence and high-frequency details essential for rich visual semantics. Codes are available at: https://github.com/ZishanShu/WaveFormer.

Related papers

AWEMixer: Adaptive Wavelet-Enhanced Mixer Network for Long-Term Time Series Forecasting [12.450099337354017]
We propose AWEMixer, an Adaptive Wavelet-Enhanced Mixer Network.<n>A Frequency Router designs to utilize the global periodicity pattern achieved by Fast Fourier Transform to adaptively weight localized wavelet subband.<n>A Coherent Gated Fusion Block to achieve selective integration of prominent frequency features with multi-scale temporal representation.
arXiv Detail & Related papers (2025-11-06T11:27:12Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
A Causality- and Frequency-Aware Deep Learning Framework for Wave Elevation Prediction Behind Floating Breakwaters [7.667077185318874]
Existing deep learning approaches exhibit limited generalization capability under unseen operating conditions.<n>E2E-FANet is a novel end-to-end neural network designed to model relationships between waves and structures.<n>It achieves superior predictive accuracy and robust generalization compared to mainstream models.
arXiv Detail & Related papers (2025-05-10T16:28:48Z)
3D Wavelet Convolutions with Extended Receptive Fields for Hyperspectral Image Classification [12.168520751389622]
Deep neural networks face numerous challenges in hyperspectral image classification.<n>This paper proposes WCNet, an improved 3D-DenseNet model integrated with wavelet transforms.<n> Experimental results demonstrate superior performance on the IN, UP, and KSC datasets.
arXiv Detail & Related papers (2025-04-15T01:39:42Z)
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization [74.3339999119713]
We develop a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies.<n>Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon.
arXiv Detail & Related papers (2024-12-06T18:22:59Z)
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation [37.35829410807451]
We propose PeriodWave, a novel universal waveform generation model. We introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal. We also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference.
arXiv Detail & Related papers (2024-08-14T13:36:17Z)
WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields [149.2296890464997]
We design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF. WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.
arXiv Detail & Related papers (2023-08-09T09:24:56Z)
Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis. By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z)
Machine learning for phase-resolved reconstruction of nonlinear ocean wave surface elevations from sparse remote sensing data [37.69303106863453]
We propose a novel approach for phase-resolved wave surface reconstruction using neural networks. Our approach utilizes synthetic yet highly realistic training data on uniform one-dimensional grids.
arXiv Detail & Related papers (2023-05-18T12:30:26Z)
Learning Wave Propagation with Attention-Based Convolutional Recurrent Autoencoder Net [0.0]
We present an end-to-end attention-based convolutional recurrent autoencoder (AB-CRAN) network for data-driven modeling of wave propagation phenomena. We employ a denoising-based convolutional autoencoder from the full-order snapshots given by time-dependent hyperbolic partial differential equations for wave propagation. The attention-based sequence-to-sequence network increases the time-horizon of prediction by five times compared to the plain RNN-LSTM.
arXiv Detail & Related papers (2022-01-17T20:51:59Z)
Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)
Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals. Two main challenges are the complex acoustic environment and the real-time processing requirement. We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.