WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition
- URL: http://arxiv.org/abs/2510.21079v1
- Date: Fri, 24 Oct 2025 01:41:31 GMT
- Title: WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition
- Authors: Guoan Xu, Yang Xiao, Wenjing Jia, Guangwei Gao, Guo-Jun Qi, Chia-Wen Lin,
- Abstract summary: We propose a novel decoder architecture, WaveSeg, which jointly optimize feature refinement in spatial and wavelet domains.<n>High-frequency components are first learned from input images as explicit priors to reinforce boundary details.<n>Experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches.
- Score: 61.3530659856013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent semantic segmentation networks heavily rely on powerful pretrained encoders, most employ simplistic decoders, leading to suboptimal trade-offs between semantic context and fine-grained detail preservation. To address this, we propose a novel decoder architecture, WaveSeg, which jointly optimizes feature refinement in spatial and wavelet domains. Specifically, high-frequency components are first learned from input images as explicit priors to reinforce boundary details at early stages. A multi-scale fusion mechanism, Dual Domain Operation (DDO), is then applied, and the novel Spectrum Decomposition Attention (SDA) block is proposed, which is developed to leverage Mamba's linear-complexity long-range modeling to enhance high-frequency structural details. Meanwhile, reparameterized convolutions are applied to preserve low-frequency semantic integrity in the wavelet domain. Finally, a residual-guided fusion integrates multi-scale features with boundary-aware representations at native resolution, producing semantically and structurally rich feature maps. Extensive experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches both quantitatively and qualitatively, achieving efficient and precise segmentation.
Related papers
- WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation [4.23704854635294]
Domain-generalized retinal vessel segmentation is critical for automated ophthalmic diagnosis.<n>We propose WaveRNet, a wavelet-guided frequency learning framework for robust domain-generalized retinal vessel segmentation.
arXiv Detail & Related papers (2026-01-09T16:58:29Z) - WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing [5.65492058135409]
WaveMAE is a masked autoencoding framework tailored for multispectral satellite imagery.<n>To ensure fairness in evaluation, all methods are pretrained on the same dataset (fMoW-S2)<n>WaveMAE achieves consistent improvements over prior state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-26T14:45:30Z) - A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification [45.80836671298513]
This paper introduces the spatial-spectral-frequency interaction network (S$2$Fin), which integrates pairwise fusion modules across the spatial, spectral, and frequency domains.<n> Experiments on four benchmark multimodal datasets with limited labeled data demonstrate that S$2$Fin performs superior classification, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-10-06T09:33:35Z) - Missing Fine Details in Images: Last Seen in High Frequencies [17.95197409468585]
We propose a wavelet-based, frequency-aware variational autoencoder (FA-VAE) framework that explicitly decouples the optimization of low- and high-frequency components.<n>Our approach bridges the fidelity gap in current latent tokenizers and emphasizes the importance of frequency-aware optimization for realistic image synthesis.
arXiv Detail & Related papers (2025-09-05T18:49:08Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling [50.8215545241128]
We propose a.<n> Boundary Modeling Network (HBMNet), which includes three modules: an Audio-Visual Feature, a.<n> Coarse Proposal Generator and a Fine-Hierarchical Probabilities Generator.<n>From the modality perspective, we enhance audio-visual encoding and fusion, reinforced by frame-level supervision.<n>Experiments show that encoding and fusion primarily improve precision, while frame-level supervision recall.
arXiv Detail & Related papers (2025-08-04T02:41:09Z) - Efficient Dual-domain Image Dehazing with Haze Prior Perception [26.57698394898644]
Transformer-based models exhibit strong global modeling capabilities in single-image dehazing, but their high computational cost limits real-time applicability.<n>We propose the Dark Channel Guided Frequency-aware Dehazing Network (DGFDNet), a novel dual-domain framework that performs physically guided degradation alignment.<n>Experiments on four benchmark haze datasets demonstrate that DGFDNet achieves state-of-the-art performance with superior robustness and real-time efficiency.
arXiv Detail & Related papers (2025-07-15T06:56:56Z) - PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z) - Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning [86.99944014645322]
We introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning.
We decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network.
Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.
arXiv Detail & Related papers (2024-11-03T04:02:35Z) - TFill: Image Completion via a Transformer-Based Architecture [69.62228639870114]
We propose treating image completion as a directionless sequence-to-sequence prediction task.
We employ a restrictive CNN with small and non-overlapping RF for token representation.
In a second phase, to improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced.
arXiv Detail & Related papers (2021-04-02T01:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.