Related papers: ESTM: An Enhanced Dual-Branch Spectral-Temporal Mamba for Anomalous Sound Detection

ESTM: An Enhanced Dual-Branch Spectral-Temporal Mamba for Anomalous Sound Detection

URL: http://arxiv.org/abs/2509.02471v1
Date: Tue, 02 Sep 2025 16:23:49 GMT
Title: ESTM: An Enhanced Dual-Branch Spectral-Temporal Mamba for Anomalous Sound Detection
Authors: Chengyuan Ma, Peng Jia, Hongyue Guo, Wenming Yang,
Abstract summary: We propose a novel framework, ESTM, which is based on a dual-path Mamba architecture with time-frequency decoupled modeling.<n> ESTM extracts rich feature representations from different time segments and frequency bands by fusing enhanced Mel spectrograms and raw audio features.<n>Our experiments demonstrate that ESTM improves anomalous detection performance on the DCASE 2020 Task 2 dataset.
Score: 39.234515088121086
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The core challenge in industrial equipment anoma lous sound detection (ASD) lies in modeling the time-frequency coupling characteristics of acoustic features. Existing modeling methods are limited by local receptive fields, making it difficult to capture long-range temporal patterns and cross-band dynamic coupling effects in machine acoustic features. In this paper, we propose a novel framework, ESTM, which is based on a dual-path Mamba architecture with time-frequency decoupled modeling and utilizes Selective State-Space Models (SSM) for long-range sequence modeling. ESTM extracts rich feature representations from different time segments and frequency bands by fusing enhanced Mel spectrograms and raw audio features, while further improving sensitivity to anomalous patterns through the TriStat-Gating (TSG) module. Our experiments demonstrate that ESTM improves anomalous detection performance on the DCASE 2020 Task 2 dataset, further validating the effectiveness of the proposed method.

Related papers

Contextual and Seasonal LSTMs for Time Series Anomaly Detection [49.50689313712684]
We propose a novel prediction-based framework named Contextual and Seasonal LSTMs (CS-LSTMs)<n>CS-LSTMs are built upon a noise decomposition strategy and jointly leverage contextual dependencies and seasonal patterns.<n>They consistently outperform state-of-the-art methods, highlighting their effectiveness and practical value in robust time series anomaly detection.
arXiv Detail & Related papers (2026-02-10T11:46:15Z)
Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models [53.79667447811139]
We show that a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal.<n>These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader estimation.
arXiv Detail & Related papers (2026-02-04T15:25:02Z)
TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection [39.234515088121086]
We propose a novel framework, TLDiffGAN, which consists of two complementary branches.<n>One branch incorporates a latent diffusion model into the GAN generator for adversarial training, thereby making the discriminator's task more challenging and improving the quality of generated samples.<n>We also introduce a TMixup spectrogram augmentation technique to enhance sensitivity to subtle and localized temporal patterns that are often overlooked.
arXiv Detail & Related papers (2026-02-01T07:04:30Z)
TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition [59.99922360648663]
TSkel-Mamba is a hybrid Transformer-Mamba framework that effectively captures both spatial and temporal dynamics.<n>The MTI module employs multi-scale Cycle operators to capture cross-channel temporal interactions, a critical factor in action recognition.
arXiv Detail & Related papers (2025-12-12T11:55:16Z)
Fourier-KAN-Mamba: A Novel State-Space Equation Approach for Time-Series Anomaly Detection [12.167081924571951]
Mamba-based state-space models have shown remarkable efficiency in long-sequence modeling.<n>We propose a novel hybrid architecture that integrates Fourier layer, Kolmogorov-Arnold Networks (KAN), and Mamba selective state-space model.<n>Our method significantly outperforms existing state-of-the-art approaches.
arXiv Detail & Related papers (2025-11-19T03:45:06Z)
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals [8.411477071838592]
We propose a novel foundation model ECHO that integrates an advanced band-split architecture with frequency positional embeddings.<n>We evaluate our method on various kinds of machine signal datasets.
arXiv Detail & Related papers (2025-08-20T13:10:44Z)
Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services [10.421371572062595]
This study proposes an anomaly detection method based on the Transformer architecture with integrated multiscale feature perception.<n>The proposed method outperforms mainstream baseline models in key metrics, including precision, recall, AUC, and F1-score.
arXiv Detail & Related papers (2025-08-20T07:52:36Z)
MR-EEGWaveNet: Multiresolutional EEGWaveNet for Seizure Detection from Long EEG Recordings [7.9595266728435545]
We propose a novel end-to-end model, "Multiresolution EEGWaveNet (MR-EEGWaveNet)," which efficiently distinguishes seizure events from background electrogram (EEG) artifacts/noise.<n>The model has three modules: convolution, feature extraction, predictor.<n>The proposed MR-EEGWaveNet significantly outperformed the conventional non-multiresolution approach.
arXiv Detail & Related papers (2025-05-23T14:40:50Z)
Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics [0.7083699704958353]
We propose a lightweight squeeze-excitation deep learning-based multi stream spatial temporal dynamics time-varying feature extraction approach.<n>The proposed model was tested on the Ninapro DB2, DB4, and DB5 datasets, achieving accuracy rates of 96.41%, 92.40%, and 93.34%, respectively.
arXiv Detail & Related papers (2025-04-04T07:11:12Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection [1.9223495770071632]
This system uses three feature streams: RGB video, optical flow, and audio signals, where each stream extracts complementary spatial and temporal features. The system significantly improves anomaly detection accuracy and robustness across three datasets.
arXiv Detail & Related papers (2024-09-17T14:17:52Z)
DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision. This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.