Related papers: SIEFormer: Spectral-Interpretable and -Enhanced Transformer for Generalized Category Discovery

SIEFormer: Spectral-Interpretable and -Enhanced Transformer for Generalized Category Discovery

URL: http://arxiv.org/abs/2602.13067v1
Date: Fri, 13 Feb 2026 16:22:31 GMT
Title: SIEFormer: Spectral-Interpretable and -Enhanced Transformer for Generalized Category Discovery
Authors: Chunming Li, Shidong Wang, Tong Xin, Haofeng Zhang,
Abstract summary: SIEFormer is composed of two main branches, each corresponding to an implicit and explicit spectral perspective of the ViT.<n>The implicit branch realizes the use of different types of graph Laplacians to model the local structure correlations of tokens.<n>The explicit branch, on the other hand, introduces a Maneuverable Filtering Layer (MFL) that learns global dependencies among tokens.
Score: 14.288193104482986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel approach, Spectral-Interpretable and -Enhanced Transformer (SIEFormer), which leverages spectral analysis to reinterpret the attention mechanism within Vision Transformer (ViT) and enhance feature adaptability, with particular emphasis on challenging Generalized Category Discovery (GCD) tasks. The proposed SIEFormer is composed of two main branches, each corresponding to an implicit and explicit spectral perspective of the ViT, enabling joint optimization. The implicit branch realizes the use of different types of graph Laplacians to model the local structure correlations of tokens, along with a novel Band-adaptive Filter (BaF) layer that can flexibly perform both band-pass and band-reject filtering. The explicit branch, on the other hand, introduces a Maneuverable Filtering Layer (MFL) that learns global dependencies among tokens by applying the Fourier transform to the input ``value" features, modulating the transformed signal with a set of learnable parameters in the frequency domain, and then performing an inverse Fourier transform to obtain the enhanced features. Extensive experiments reveal state-of-the-art performance on multiple image recognition datasets, reaffirming the superiority of our approach through ablation studies and visualizations.

Related papers

Data-Driven Graph Filters via Adaptive Spectral Shaping [10.449640808601199]
We introduce Adaptive Spectral Shaping, a data-driven framework for graph filtering.<n>The framework provides compact spectral modules that plug into graph signal processing pipelines and graph neural networks.
arXiv Detail & Related papers (2026-02-03T16:20:49Z)
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer [0.0]
This study introduces Hybrid Kolmogorov-Arnold Network (KAN)-T (Hyb-KAN ViT) to address the inherent limitations of Multi-Arnold Perceptrons (MLPs) in Vision Transformers (ViTs)<n>Hyb-KAN ViT is a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions.
arXiv Detail & Related papers (2025-05-07T19:13:17Z)
DiffFormer: a Differential Spatial-Spectral Transformer for Hyperspectral Image Classification [3.271106943956333]
Hyperspectral image classification (HSIC) has gained significant attention because of its potential in analyzing high-dimensional data with rich spectral and spatial information.<n>We propose the Differential Spatial-Spectral Transformer (DiffFormer) to address the inherent challenges of HSIC, such as spectral redundancy and spatial discontinuity.<n>Experiments on benchmark hyperspectral datasets demonstrate the superiority of DiffFormer in terms of classification accuracy, computational efficiency, and generalizability.
arXiv Detail & Related papers (2024-12-23T07:21:41Z)
A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions. We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z)
HoloNets: Spectral Convolutions do extend to Directed Graphs [59.851175771106625]
Conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs. Here we show this traditional reliance on the graph Fourier transform to be superfluous. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based.
arXiv Detail & Related papers (2023-10-03T17:42:09Z)
Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z)
Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification [10.291631977766672]
We propose a novel approach called Fourier Test-time Adaptation (FTTA) to integrate input and model tuning. FTTA builds a reliable multi-level consistency measurement of paired inputs for achieving self-supervised of prediction. It was extensively validated on three large classification datasets with different modalities and organs.
arXiv Detail & Related papers (2023-06-05T02:29:38Z)
Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain. Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z)
Investigating Expressiveness of Transformer in Spectral Domain for Graphs [6.092217185687028]
We study and prove the link between the spatial and spectral domain in the realm of the transformer. We propose FeTA, a framework that aims to perform attention over the entire graph spectrum analogous to the attention in spatial space.
arXiv Detail & Related papers (2022-01-23T18:03:22Z)
SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers [91.09957836250209]
Hyperspectral (HS) images are characterized by approximately contiguous spectral information. CNNs have been proven to be a powerful feature extractor in HS image classification. We propose a novel backbone network called ulSpectralFormer for HS image classification.
arXiv Detail & Related papers (2021-07-07T02:59:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.