Exploring the Potential of SSL Models for Sound Event Detection
- URL: http://arxiv.org/abs/2505.11889v1
- Date: Sat, 17 May 2025 07:54:31 GMT
- Title: Exploring the Potential of SSL Models for Sound Event Detection
- Authors: Hanfang Cui, Longfei Song, Li Li, Dongxing Xu, Yanhua Long,
- Abstract summary: Self-supervised learning (SSL) models offer powerful representations for sound event detection (SED)<n>This study systematically evaluates state-of-the-art SSL models to guide optimal model selection and integration for SED.<n>We propose a framework that combines heterogeneous SSL representations through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and full aggregation.
- Score: 6.6731129629430725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) models offer powerful representations for sound event detection (SED), yet their synergistic potential remains underexplored. This study systematically evaluates state-of-the-art SSL models to guide optimal model selection and integration for SED. We propose a framework that combines heterogeneous SSL representations (e.g., BEATs, HuBERT, WavLM) through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and full aggregation. Experiments on the DCASE 2023 Task 4 Challenge reveal that dual-modal fusion (e.g., CRNN+BEATs+WavLM) achieves complementary performance gains, while CRNN+BEATs alone delivers the best results among individual SSL models. We further introduce normalized sound event bounding boxes (nSEBBs), an adaptive post-processing method that dynamically adjusts event boundary predictions, improving PSDS1 by up to 4% for standalone SSL models. These findings highlight the compatibility and complementarity of SSL architectures, providing guidance for task-specific fusion and robust SED system design.
Related papers
- Comprehensive Attribute Encoding and Dynamic LSTM HyperModels for Outcome Oriented Predictive Business Process Monitoring [5.634923879819779]
Predictive Business Process Monitoring aims to forecast future outcomes of ongoing business processes.<n>Existing methods often lack flexibility to handle real-world challenges such as simultaneous events, class imbalance, and multi-level attributes.<n>We propose a suite of dynamic LSTM HyperModels that integrate two-level hierarchical encoding for event and sequence attributes.<n> specialized LSTM variants for simultaneous event modeling, leveraging multidimensional embeddings and time-difference flag augmentation.
arXiv Detail & Related papers (2025-06-04T08:27:58Z) - Latent Stochastic Interpolants [4.674313947272508]
Evidence Interpolants (SI) are a powerful framework for generative modeling, capable of flexibly transforming between two probability distributions.<n>This work presents Latent Interpolants (LSI) enabling joint learning in a latent space with end-to-end optimized models.<n>We demonstrate the efficacy of LSI through comprehensive experiments on the standard large scale ImageNet generation benchmark.
arXiv Detail & Related papers (2025-06-02T21:34:50Z) - Breaking the SSL-AL Barrier: A Synergistic Semi-Supervised Active Learning Framework for 3D Object Detection [34.049483237480615]
Traditional active learning approaches rely on a small amount of labeled data to train an initial model for data selection.<n>We propose a Synergistic Semi-Supervised Active Learning framework, dubbed as S-SSAL.<n>We show that S-SSAL can achieve performance comparable to models trained on the full dataset.
arXiv Detail & Related papers (2025-01-26T08:43:59Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - An Ensemble Semi-Supervised Adaptive Resonance Theory Model with
Explanation Capability for Pattern Classification [41.35711585943589]
This paper proposes a new interpretable SSL model using the supervised and unsupervised Adaptive Resonance Theory (ART) family of networks.
The main advantages of SSL-ART include the capability of performing online learning and reducing the number of redundant prototype nodes.
A weighted voting strategy is introduced to form an ensemble SSL-ART model, which is denoted as WESSL-ART.
arXiv Detail & Related papers (2023-05-19T20:20:44Z) - Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z) - Boosting Discriminative Visual Representation Learning with
Scenario-Agnostic Mixup [54.09898347820941]
We propose textbfScenario-textbfAgnostic textbfMixup (SAMix) for both Self-supervised Learning (SSL) and supervised learning (SL) scenarios.
Specifically, we hypothesize and verify the objective function of mixup generation as optimizing local smoothness between two mixed classes.
A label-free generation sub-network is designed, which effectively provides non-trivial mixup samples and improves transferable abilities.
arXiv Detail & Related papers (2021-11-30T14:49:59Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - SemiNLL: A Framework of Noisy-Label Learning by Semi-Supervised Learning [58.26384597768118]
SemiNLL is a versatile framework that combines SS strategies and SSL models in an end-to-end manner.
Our framework can absorb various SS strategies and SSL backbones, utilizing their power to achieve promising performance.
arXiv Detail & Related papers (2020-12-02T01:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.