An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models
- URL: http://arxiv.org/abs/2508.15334v1
- Date: Thu, 21 Aug 2025 08:04:08 GMT
- Title: An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models
- Authors: Guirui Zhong, Qing Wang, Jun Du, Lei Wang, Mingqi Cai, Xin Fang,
- Abstract summary: Anomalous Sound Detection (ASD) aims at identifying anomalous sounds from machines.<n>Uncertainty of anomaly location and much redundant information such as noise in machine sounds hinder the improvement of ASD system performance.<n>This paper proposes a novel audio feature of filter banks with evenly distributed intervals, ensuring equal attention to all frequency ranges in the audio.
- Score: 34.59032968400701
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Anomalous Sound Detection (ASD) aims at identifying anomalous sounds from machines and has gained extensive research interests from both academia and industry. However, the uncertainty of anomaly location and much redundant information such as noise in machine sounds hinder the improvement of ASD system performance. This paper proposes a novel audio feature of filter banks with evenly distributed intervals, ensuring equal attention to all frequency ranges in the audio, which enhances the detection of anomalies in machine sounds. Moreover, based on pre-trained models, this paper presents a parameter-free feature enhancement approach to remove redundant information in machine audio. It is believed that this parameter-free strategy facilitates the effective transfer of universal knowledge from pre-trained tasks to the ASD task during model fine-tuning. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge dataset demonstrate significant improvements in ASD performance with our proposed methods.
Related papers
- Training-Free Intelligibility-Guided Observation Addition for Noisy ASR [57.74127683005929]
This paper proposes an intelligibility-guided observation addition (OA) method to improve speech recognition in noisy environments.<n>Experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines.
arXiv Detail & Related papers (2026-02-24T14:46:54Z) - MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection [5.578413517654703]
We propose a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection systems.<n>We use large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions.
arXiv Detail & Related papers (2025-07-28T09:42:41Z) - Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving Labs [0.49478969093606673]
This study develops an automated workflow to detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values.<n>A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features.
arXiv Detail & Related papers (2025-07-15T03:35:56Z) - Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models.
We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z) - ASD-Diffusion: Anomalous Sound Detection with Diffusion Models [6.659078422704148]
Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories.
Post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction.
Denoising diffusion implicit model is introduced to accelerate the inference speed.
arXiv Detail & Related papers (2024-09-24T10:42:23Z) - Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models [45.90037602677841]
This paper introduces a robust Anomalous Sound Detection (ASD) model that leverages audio pre-trained models.
We fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy.
Our experiments establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-09-11T05:19:38Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Inference and Denoise: Causal Inference-based Neural Speech Enhancement [83.4641575757706]
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
The proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE.
arXiv Detail & Related papers (2022-11-02T15:03:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.