Related papers: Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

URL: http://arxiv.org/abs/2407.12322v3
Date: Mon, 29 Jul 2024 18:03:50 GMT
Title: Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer
Authors: Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu,
Abstract summary: We introduce a frequency-aware attention module to unweave skeleton frequency representations. We also develop a mixed transformer architecture to incorporate spatial features with frequency features. Experiments show that FreqMiXFormer outperforms SOTA on 3 popular skeleton recognition datasets.
Score: 18.459822172890473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that exhibit similar motion patterns. To address this challenge, we introduce the Frequency-aware Mixed Transformer (FreqMixFormer), specifically designed for recognizing similar skeletal actions with subtle discriminative motions. First, we introduce a frequency-aware attention module to unweave skeleton frequency representations by embedding joint features into frequency attention maps, aiming to distinguish the discriminative movements based on their frequency coefficients. Subsequently, we develop a mixed transformer architecture to incorporate spatial features with frequency features to model the comprehensive frequency-spatial patterns. Additionally, a temporal transformer is proposed to extract the global correlations across frames. Extensive experiments show that FreqMiXFormer outperforms SOTA on 3 popular skeleton action recognition datasets, including NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

Related papers

KFS: KAN based adaptive Frequency Selection learning architecture for long term time series forecasting [8.839783121363835]
We propose a KAN based adaptive Frequency Selection learning architecture (KFS) to address these challenges.<n>This framework tackles prediction challenges stemming from cross-scale noise interference and complex pattern modeling.<n>Experiments across multiple real-world time series datasets demonstrate that KT achieves state-of-the-art performance as a simple yet effective architecture.
arXiv Detail & Related papers (2025-08-01T13:50:57Z)
FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification [56.925103708982164]
We present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact.<n>We propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks.<n>FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks.
arXiv Detail & Related papers (2025-05-29T07:18:28Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies. We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis. Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system. We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain. To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB) We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z)
FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting [17.738942892605234]
This paper presents textbfFreEformer, a simple yet effective model that leverages a textbfFrequency textbfEnhanced Transtextbfformer. Experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks.
arXiv Detail & Related papers (2025-01-23T08:53:45Z)
CATCH: Channel-Aware multivariate Time Series Anomaly Detection via Frequency Patching [24.927390742543707]
We introduce CATCH, a framework based on frequency patching. We propose a Channel Fusion Module (CFM) which features a patch-wise mask generator and a masked-attention mechanism. Experiments on 9 real-world datasets and 12 synthetic datasets demonstrate that CATCH achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-10-16T05:58:55Z)
Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images. Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries. We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z)
Frequency-Adaptive Pan-Sharpening with Mixture of Experts [22.28680499480492]
We propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening. Our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes.
arXiv Detail & Related papers (2024-01-04T08:58:25Z)
Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z)
Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain. Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage. Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z)
MultiWave: Multiresolution Deep Architectures through Wavelet Decomposition for Multivariate Time Series Prediction [6.980076213134384]
MultiWave is a novel framework that enhances deep learning time series models by incorporating components that operate at the intrinsic frequencies of signals. We show that MultiWave consistently identifies critical features and their frequency components, thus providing valuable insights into the applications studied.
arXiv Detail & Related papers (2023-06-16T20:07:15Z)
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences. We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences. The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z)
Masked Frequency Modeling for Self-Supervised Visual Pre-Training [102.89756957704138]
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum. For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token.
arXiv Detail & Related papers (2022-06-15T17:58:30Z)
Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)
Space-time Mixing Attention for Video Transformer [55.50839896863275]
We propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence. We demonstrate that our model produces very high recognition accuracy on the most popular video recognition datasets.
arXiv Detail & Related papers (2021-06-10T17:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.