Related papers: Exploring the Potential of SSL Models for Sound Event Detection

Exploring the Potential of SSL Models for Sound Event Detection

URL: http://arxiv.org/abs/2505.11889v1
Date: Sat, 17 May 2025 07:54:31 GMT
Title: Exploring the Potential of SSL Models for Sound Event Detection
Authors: Hanfang Cui, Longfei Song, Li Li, Dongxing Xu, Yanhua Long,
Abstract summary: Self-supervised learning (SSL) models offer powerful representations for sound event detection (SED)<n>This study systematically evaluates state-of-the-art SSL models to guide optimal model selection and integration for SED.<n>We propose a framework that combines heterogeneous SSL representations through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and full aggregation.
Score: 6.6731129629430725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning (SSL) models offer powerful representations for sound event detection (SED), yet their synergistic potential remains underexplored. This study systematically evaluates state-of-the-art SSL models to guide optimal model selection and integration for SED. We propose a framework that combines heterogeneous SSL representations (e.g., BEATs, HuBERT, WavLM) through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and full aggregation. Experiments on the DCASE 2023 Task 4 Challenge reveal that dual-modal fusion (e.g., CRNN+BEATs+WavLM) achieves complementary performance gains, while CRNN+BEATs alone delivers the best results among individual SSL models. We further introduce normalized sound event bounding boxes (nSEBBs), an adaptive post-processing method that dynamically adjusts event boundary predictions, improving PSDS1 by up to 4% for standalone SSL models. These findings highlight the compatibility and complementarity of SSL architectures, providing guidance for task-specific fusion and robust SED system design.

Related papers

Comprehensive Attribute Encoding and Dynamic LSTM HyperModels for Outcome Oriented Predictive Business Process Monitoring [5.634923879819779]
Predictive Business Process Monitoring aims to forecast future outcomes of ongoing business processes.<n>Existing methods often lack flexibility to handle real-world challenges such as simultaneous events, class imbalance, and multi-level attributes.<n>We propose a suite of dynamic LSTM HyperModels that integrate two-level hierarchical encoding for event and sequence attributes.<n> specialized LSTM variants for simultaneous event modeling, leveraging multidimensional embeddings and time-difference flag augmentation.
arXiv Detail & Related papers (2025-06-04T08:27:58Z)
Latent Stochastic Interpolants [4.674313947272508]
Evidence Interpolants (SI) are a powerful framework for generative modeling, capable of flexibly transforming between two probability distributions.<n>This work presents Latent Interpolants (LSI) enabling joint learning in a latent space with end-to-end optimized models.<n>We demonstrate the efficacy of LSI through comprehensive experiments on the standard large scale ImageNet generation benchmark.
arXiv Detail & Related papers (2025-06-02T21:34:50Z)
Breaking the SSL-AL Barrier: A Synergistic Semi-Supervised Active Learning Framework for 3D Object Detection [34.049483237480615]
Traditional active learning approaches rely on a small amount of labeled data to train an initial model for data selection.<n>We propose a Synergistic Semi-Supervised Active Learning framework, dubbed as S-SSAL.<n>We show that S-SSAL can achieve performance comparable to models trained on the full dataset.
arXiv Detail & Related papers (2025-01-26T08:43:59Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
An Ensemble Semi-Supervised Adaptive Resonance Theory Model with Explanation Capability for Pattern Classification [41.35711585943589]
This paper proposes a new interpretable SSL model using the supervised and unsupervised Adaptive Resonance Theory (ART) family of networks. The main advantages of SSL-ART include the capability of performing online learning and reducing the number of redundant prototype nodes. A weighted voting strategy is introduced to form an ensemble SSL-ART model, which is denoted as WESSL-ART.
arXiv Detail & Related papers (2023-05-19T20:20:44Z)
Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem. We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z)
Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift. Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z)
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup [54.09898347820941]
We propose textbfScenario-textbfAgnostic textbfMixup (SAMix) for both Self-supervised Learning (SSL) and supervised learning (SL) scenarios. Specifically, we hypothesize and verify the objective function of mixup generation as optimizing local smoothness between two mixed classes. A label-free generation sub-network is designed, which effectively provides non-trivial mixup samples and improves transferable abilities.
arXiv Detail & Related papers (2021-11-30T14:49:59Z)
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data. We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning. Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z)
SemiNLL: A Framework of Noisy-Label Learning by Semi-Supervised Learning [58.26384597768118]
SemiNLL is a versatile framework that combines SS strategies and SSL models in an end-to-end manner. Our framework can absorb various SS strategies and SSL backbones, utilizing their power to achieve promising performance.
arXiv Detail & Related papers (2020-12-02T01:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.