Related papers: WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition

WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition

URL: http://arxiv.org/abs/2510.21079v1
Date: Fri, 24 Oct 2025 01:41:31 GMT
Title: WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition
Authors: Guoan Xu, Yang Xiao, Wenjing Jia, Guangwei Gao, Guo-Jun Qi, Chia-Wen Lin,
Abstract summary: We propose a novel decoder architecture, WaveSeg, which jointly optimize feature refinement in spatial and wavelet domains.<n>High-frequency components are first learned from input images as explicit priors to reinforce boundary details.<n>Experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches.
Score: 61.3530659856013
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent semantic segmentation networks heavily rely on powerful pretrained encoders, most employ simplistic decoders, leading to suboptimal trade-offs between semantic context and fine-grained detail preservation. To address this, we propose a novel decoder architecture, WaveSeg, which jointly optimizes feature refinement in spatial and wavelet domains. Specifically, high-frequency components are first learned from input images as explicit priors to reinforce boundary details at early stages. A multi-scale fusion mechanism, Dual Domain Operation (DDO), is then applied, and the novel Spectrum Decomposition Attention (SDA) block is proposed, which is developed to leverage Mamba's linear-complexity long-range modeling to enhance high-frequency structural details. Meanwhile, reparameterized convolutions are applied to preserve low-frequency semantic integrity in the wavelet domain. Finally, a residual-guided fusion integrates multi-scale features with boundary-aware representations at native resolution, producing semantically and structurally rich feature maps. Extensive experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches both quantitatively and qualitatively, achieving efficient and precise segmentation.

Related papers

WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation [4.23704854635294]
Domain-generalized retinal vessel segmentation is critical for automated ophthalmic diagnosis.<n>We propose WaveRNet, a wavelet-guided frequency learning framework for robust domain-generalized retinal vessel segmentation.
arXiv Detail & Related papers (2026-01-09T16:58:29Z)
WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing [5.65492058135409]
WaveMAE is a masked autoencoding framework tailored for multispectral satellite imagery.<n>To ensure fairness in evaluation, all methods are pretrained on the same dataset (fMoW-S2)<n>WaveMAE achieves consistent improvements over prior state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-26T14:45:30Z)
A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification [45.80836671298513]
This paper introduces the spatial-spectral-frequency interaction network (S$2$Fin), which integrates pairwise fusion modules across the spatial, spectral, and frequency domains.<n> Experiments on four benchmark multimodal datasets with limited labeled data demonstrate that S$2$Fin performs superior classification, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-10-06T09:33:35Z)
Missing Fine Details in Images: Last Seen in High Frequencies [17.95197409468585]
We propose a wavelet-based, frequency-aware variational autoencoder (FA-VAE) framework that explicitly decouples the optimization of low- and high-frequency components.<n>Our approach bridges the fidelity gap in current latent tokenizers and emphasizes the importance of frequency-aware optimization for realistic image synthesis.
arXiv Detail & Related papers (2025-09-05T18:49:08Z)
Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z)
Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling [50.8215545241128]
We propose a.<n> Boundary Modeling Network (HBMNet), which includes three modules: an Audio-Visual Feature, a.<n> Coarse Proposal Generator and a Fine-Hierarchical Probabilities Generator.<n>From the modality perspective, we enhance audio-visual encoding and fusion, reinforced by frame-level supervision.<n>Experiments show that encoding and fusion primarily improve precision, while frame-level supervision recall.
arXiv Detail & Related papers (2025-08-04T02:41:09Z)
Efficient Dual-domain Image Dehazing with Haze Prior Perception [26.57698394898644]
Transformer-based models exhibit strong global modeling capabilities in single-image dehazing, but their high computational cost limits real-time applicability.<n>We propose the Dark Channel Guided Frequency-aware Dehazing Network (DGFDNet), a novel dual-domain framework that performs physically guided degradation alignment.<n>Experiments on four benchmark haze datasets demonstrate that DGFDNet achieves state-of-the-art performance with superior robustness and real-time efficiency.
arXiv Detail & Related papers (2025-07-15T06:56:56Z)
PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z)
Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning [86.99944014645322]
We introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning. We decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network. Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.
arXiv Detail & Related papers (2024-11-03T04:02:35Z)
TFill: Image Completion via a Transformer-Based Architecture [69.62228639870114]
We propose treating image completion as a directionless sequence-to-sequence prediction task. We employ a restrictive CNN with small and non-overlapping RF for token representation. In a second phase, to improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced.
arXiv Detail & Related papers (2021-04-02T01:42:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.