Related papers: SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection

SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection

URL: http://arxiv.org/abs/2511.21325v1
Date: Wed, 26 Nov 2025 12:16:38 GMT
Title: SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection
Authors: Ido Nitzan HIdekel, Gal lifshitz, Khen Cohen, Dan Raviv,
Abstract summary: Spectral-cONtrastive Audio Residuals (AR) is a frequency-guided framework for deepfake audio detectors.<n>AR disentangles an audio signal into complementary representations.<n> evaluated on the ASVspoof 2021 and in-the-wild benchmarks.
Score: 6.042897432654865
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deepfake (DF) audio detectors still struggle to generalize to out of distribution inputs. A central reason is spectral bias, the tendency of neural networks to learn low-frequency structure before high-frequency (HF) details, which both causes DF generators to leave HF artifacts and leaves those same artifacts under-exploited by common detectors. To address this gap, we propose Spectral-cONtrastive Audio Residuals (SONAR), a frequency-guided framework that explicitly disentangles an audio signal into complementary representations. An XLSR encoder captures the dominant low-frequency content, while the same cloned path, preceded by learnable SRM, value-constrained high-pass filters, distills faint HF residuals. Frequency cross-attention reunites the two views for long- and short-range frequency dependencies, and a frequency-aware Jensen-Shannon contrastive loss pulls real content-noise pairs together while pushing fake embeddings apart, accelerating optimization and sharpening decision boundaries. Evaluated on the ASVspoof 2021 and in-the-wild benchmarks, SONAR attains state-of-the-art performance and converges four times faster than strong baselines. By elevating faint high-frequency residuals to first-class learning signals, SONAR unveils a fully data-driven, frequency-guided contrastive framework that splits the latent space into two disjoint manifolds: natural-HF for genuine audio and distorted-HF for synthetic audio, thereby sharpening decision boundaries. Because the scheme operates purely at the representation level, it is architecture-agnostic and, in future work, can be seamlessly integrated into any model or modality where subtle high-frequency cues are decisive.

Related papers

DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction [9.883167817281313]
Sparse-view Cone-Beam Computed Tomography reconstruction from limited X-ray projections remains a challenging problem in medical imaging.<n>This paper presents DuFal, a novel framework that integrates frequency-domain and spatial-domain processing via a dual-path architecture.<n> Experimental results on the LUNA16 and ToothFairy datasets demonstrate that DuFal significantly outperforms existing state-of-the-art methods in preserving high-frequency anatomical features.
arXiv Detail & Related papers (2026-01-21T19:27:47Z)
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution [6.767948729335409]
Real-image super-resolution (Real-ISR) seeks to recover HR images from LR inputs with mixed, unknown degradations.<n>We introduce FRAMER, a plug-and-play training scheme that exploits diffusion priors without changing the backbone or inference.
arXiv Detail & Related papers (2025-12-01T08:09:05Z)
Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective [73.86108756585857]
We analyze encoder/decoder behaviors and find that decoders depend strongly on high-frequency latent components to recover details.<n>We introduce FreqWarm, a plug-and-play frequency warm-up curriculum that increases early-stage exposure to high-frequency latent signals.
arXiv Detail & Related papers (2025-11-27T09:20:36Z)
Towards Frequency-Adaptive Learning for SAR Despeckling [10.764049665817629]
We propose a frequency-adaptive heterogeneous despeckling model based on a divide-and-conquer architecture.<n>Inspired by their differing noise characteristics, we design specialized sub-networks for different frequency components.<n>For high-frequency sub-bands rich in edges and textures, we introduce an enhanced U-Net with deformable convolutions for noise suppression and enhanced features.
arXiv Detail & Related papers (2025-11-08T07:08:22Z)
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals [8.411477071838592]
We propose a novel foundation model ECHO that integrates an advanced band-split architecture with frequency positional embeddings.<n>We evaluate our method on various kinds of machine signal datasets.
arXiv Detail & Related papers (2025-08-20T13:10:44Z)
DiffPR: Diffusion-Based Phase Reconstruction via Frequency-Decoupled Learning [4.560284382063488]
Oversmoothing remains a persistent problem when applying deep learning to off-axis quantitative phase imaging (QPI)<n>We trace this issue to spectral bias and show that the bias is reinforced by high-level skip connections.<n>We introduce DiffPR, a two-stage frequency-decoupled framework.
arXiv Detail & Related papers (2025-06-12T17:08:45Z)
F2Net: A Frequency-Fused Network for Ultra-High Resolution Remote Sensing Segmentation [10.67983913373955]
F2Net is a frequency-aware framework that decomposes UHR images into high- and low-frequency components for specialized processing.<n>A Hybrid-Frequency Fusion module integrates these observations, guided by two novel objectives.<n>F2Net achieves state-of-the-art performance with mIoU of 80.22 and 83.39, respectively.
arXiv Detail & Related papers (2025-06-09T15:09:49Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
A Wavelet-based Stereo Matching Framework for Solving Frequency Convergence Inconsistency [9.668149257194887]
We propose a wavelet-based stereo matching framework (Wavelet-Stereo) for solving frequency convergence inconsistency.<n>By processing high and low frequency components separately, our framework can simultaneously refine high-frequency information in edges and low-frequency information in smooth regions.
arXiv Detail & Related papers (2025-05-23T15:28:03Z)
FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [92.4205087439928]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose the Self-supervised Transfer (PST) and the FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models, effectively mitigating data scarcity.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.<n>This combined approach enables FUSE to construct a universal image-event that only requires lightweight decoder adaptation for target datasets.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Sharpening Neural Implicit Functions with Frequency Consolidation Priors [53.6277160912059]
Signed Distance Functions (SDFs) are vital implicit representations to represent high fidelity 3D surfaces.<n>Current methods mainly leverage a neural network to learn an SDF from various supervisions including signed, 3D point clouds, or multi-view images.<n>We introduce a method to sharpen a low frequency SDF observation by recovering its high frequency components, pursuing a sharper and more complete surface.
arXiv Detail & Related papers (2024-12-27T16:18:46Z)
Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images. Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries. We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z)
Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.