Related papers: Learning Normal Patterns in Musical Loops

Learning Normal Patterns in Musical Loops

URL: http://arxiv.org/abs/2505.23784v1
Date: Thu, 22 May 2025 19:52:00 GMT
Title: Learning Normal Patterns in Musical Loops
Authors: Shayan Dadman, Bernt Arild Bremdal, Børre Bang, Rune Dalmo,
Abstract summary: This paper introduces an unsupervised framework for detecting audio patterns in musical samples (loops) through anomaly detection techniques.<n>We address these limitations through an architecture combining deep feature extraction with unsupervised anomaly detection.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces an unsupervised framework for detecting audio patterns in musical samples (loops) through anomaly detection techniques, addressing challenges in music information retrieval (MIR). Existing methods are often constrained by reliance on handcrafted features, domain-specific limitations, or dependence on iterative user interaction. We address these limitations through an architecture combining deep feature extraction with unsupervised anomaly detection. Our approach leverages a pre-trained Hierarchical Token-semantic Audio Transformer (HTS-AT), paired with a Feature Fusion Mechanism (FFM), to generate representations from variable-length audio loops. These embeddings are processed using one-class Deep Support Vector Data Description (Deep SVDD), which learns normative audio patterns by mapping them to a compact latent hypersphere. Evaluations on curated bass and guitar datasets compare standard and residual autoencoder variants against baselines like Isolation Forest (IF) and and principle component analysis (PCA) methods. Results show our Deep SVDD models, especially the residual autoencoder variant, deliver improved anomaly separation, particularly for larger variations. This research contributes a flexible, fully unsupervised solution for processing diverse audio samples, overcoming previous structural and input limitations while enabling effective pattern identification through distance-based latent space scoring.

Related papers

MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection [5.578413517654703]
We propose a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection systems.<n>We use large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions.
arXiv Detail & Related papers (2025-07-28T09:42:41Z)
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution [6.860131654491485]
STOPA is a dataset for deepfake speech source tracing covering 8 AMs, 6 settings, and 700k samples from 13 synthesisers.<n> STOPA provides a systematically controlled framework covering a broader range of generative factors, such as the choice of the vocoder model, acoustic model, or pretrained weights.<n>This control improves attribution accuracy, aiding forensic analysis, deepfake detection, and generative model transparency.
arXiv Detail & Related papers (2025-05-26T08:00:30Z)
Unified AI for Accurate Audio Anomaly Detection [0.0]
This paper presents a unified AI framework for high-accuracy audio anomaly detection.<n>It integrates advanced noise reduction, feature extraction, and machine learning modeling techniques.<n>The framework is evaluated on benchmark datasets including TORGO and LibriSpeech.
arXiv Detail & Related papers (2025-05-20T16:56:08Z)
TSLANet: Rethinking Transformers for Time Series Representation Learning [19.795353886621715]
Time series data is characterized by its intrinsic long and short-range dependencies. We introduce a novel Time Series Lightweight Network (TSLANet) as a universal convolutional model for diverse time series tasks. Our experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2024-04-12T13:41:29Z)
Unraveling the "Anomaly" in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection. Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z)
Beyond the Benchmark: Detecting Diverse Anomalies in Videos [0.6993026261767287]
Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations. Current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection. We advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries.
arXiv Detail & Related papers (2023-10-03T09:22:06Z)
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation [49.94366155560371]
Audio-Visual (AVS) aims to precisely outline audible objects in a visual scene at the pixel level. Existing AVS methods require fine-grained annotations of audio-mask pairs in supervised learning fashion. We introduce unsupervised audio-visual segmentation with no need for task-specific data annotations and model training.
arXiv Detail & Related papers (2023-09-13T05:05:47Z)
Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.<n>Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.<n>Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z)
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting. When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances. Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction. We put forward an alternative measure of anomaly score to replace the reconstruction-based metric. Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.