TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
- URL: http://arxiv.org/abs/2602.01060v1
- Date: Sun, 01 Feb 2026 07:04:30 GMT
- Title: TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
- Authors: Chengyuan Ma, Peng Jia, Hongyue Guo, Wenming Yang,
- Abstract summary: We propose a novel framework, TLDiffGAN, which consists of two complementary branches.<n>One branch incorporates a latent diffusion model into the GAN generator for adversarial training, thereby making the discriminator's task more challenging and improving the quality of generated samples.<n>We also introduce a TMixup spectrogram augmentation technique to enhance sensitivity to subtle and localized temporal patterns that are often overlooked.
- Score: 39.234515088121086
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing generative models for unsupervised anomalous sound detection are limited by their inability to fully capture the complex feature distribution of normal sounds, while the potential of powerful diffusion models in this domain remains largely unexplored. To address this challenge, we propose a novel framework, TLDiffGAN, which consists of two complementary branches. One branch incorporates a latent diffusion model into the GAN generator for adversarial training, thereby making the discriminator's task more challenging and improving the quality of generated samples. The other branch leverages pretrained audio model encoders to extract features directly from raw audio waveforms for auxiliary discrimination. This framework effectively captures feature representations of normal sounds from both raw audio and Mel spectrograms. Moreover, we introduce a TMixup spectrogram augmentation technique to enhance sensitivity to subtle and localized temporal patterns that are often overlooked. Extensive experiments on the DCASE 2020 Challenge Task 2 dataset demonstrate the superior detection performance of TLDiffGAN, as well as its strong capability in anomalous time-frequency localization.
Related papers
- Prior-Guided DETR for Ultrasound Nodule Detection [12.28367495765275]
We propose a prior-guided DETR framework specifically designed for ultrasound detection.<n>Instead of relying on purely data-driven feature learning, the proposed framework progressively incorporates different prior knowledge at multiple stages of the network.<n> Experiments conducted on two clinically collected thyroid ultrasound datasets demonstrate that the proposed method achieves superior accuracy compared with 18 detection methods.
arXiv Detail & Related papers (2026-01-05T15:32:58Z) - Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription [6.453619274330351]
Automatic drum transcription (ADT) is traditionally formulated as a discriminative task to predict drum events from audio spectrograms.<n>Noise-to-Notes (N2N) is a framework leveraging diffusion modeling to transform audio-conditioned Gaussian noise into drum events with associated velocities.<n>N2N establishes a new state-of-the-art performance across multiple ADT benchmarks.
arXiv Detail & Related papers (2025-09-26T01:12:43Z) - Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z) - Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection [43.49146665908238]
Video anomaly detection (VAD) is a vital yet complex open-set task in computer vision.<n>We introduce a novel frequency-guided diffusion model with perturbation training.<n>We employ the 2D Discrete Cosine Transform (DCT) to separate high-frequency (local) and low-frequency (global) motion components.
arXiv Detail & Related papers (2024-12-04T05:43:53Z) - ASD-Diffusion: Anomalous Sound Detection with Diffusion Models [6.659078422704148]
Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories.
Post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction.
Denoising diffusion implicit model is introduced to accelerate the inference speed.
arXiv Detail & Related papers (2024-09-24T10:42:23Z) - DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise.
Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z) - SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
Music Synthesis [0.0]
We introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN.
We show the merits of our proposed model for speech and music synthesis on several datasets.
arXiv Detail & Related papers (2024-01-30T09:17:57Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection [80.20339155618612]
DiffusionAD is a novel anomaly detection pipeline comprising a reconstruction sub-network and a segmentation sub-network.<n>A rapid one-step denoising paradigm achieves hundreds of times acceleration while preserving comparable reconstruction quality.<n>Considering the diversity in the manifestation of anomalies, we propose a norm-guided paradigm to integrate the benefits of multiple noise scales.
arXiv Detail & Related papers (2023-03-15T16:14:06Z) - On the Frequency Bias of Generative Models [61.60834513380388]
We analyze proposed measures against high-frequency artifacts in state-of-the-art GAN training.
We find that none of the existing approaches can fully resolve spectral artifacts yet.
Our results suggest that there is great potential in improving the discriminator.
arXiv Detail & Related papers (2021-11-03T18:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.