Related papers: Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification

Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification

URL: http://arxiv.org/abs/2509.24901v2
Date: Thu, 02 Oct 2025 11:39:06 GMT
Title: Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
Authors: Lukas Rauch, René Heinrich, Houtan Ghaffari, Lukas Miklautz, Ilyass Moummad, Bernhard Sick, Christoph Scholz,
Abstract summary: Self-supervised learning in audio defaults to fine-tuning.<n>We introduce binarized probes: a lightweight and simple pooling method that learns prototypes to perform class-wise information aggregation.<n>Our work establishes probing as a competitive and efficient paradigm for evaluating audio SSL models, challenging the reliance on costly fine-tuning.
Score: 8.07177858013243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although probing frozen models has become a standard evaluation paradigm, self-supervised learning in audio defaults to fine-tuning. A key reason is that global pooling creates an information bottleneck causing linear probes to misrepresent the embedding quality: The $\texttt{cls}$-token discards crucial token information about dispersed, localized events in multi-label audio. This weakness is rooted in the mismatch between the pretraining objective (operating globally) and the downstream task (localized events). Across a comprehensive benchmark of 13 datasets and 6 spectrogram-based encoders, we first investigate the global pooling bottleneck. We then introduce binarized prototypical probes: a lightweight and simple pooling method that learns prototypes to perform class-wise information aggregation. Despite its simplicity, our method notably outperforms linear and attentive probing. Our work establishes probing as a competitive and efficient paradigm for evaluating audio SSL models, challenging the reliance on costly fine-tuning.

Related papers

Combating Noisy Labels through Fostering Self- and Neighbor-Consistency [120.4394402099635]
Label noise is pervasive in various real-world scenarios, posing challenges in supervised deep learning.<n>We propose a noise-robust method named Jo-SNC (textbfJoint sample selection and model regularization based on textbfSelf- and textbfNeighbor-textbfConsistency)<n>We design a self-adaptive, data-driven thresholding scheme to adjust per-class selection thresholds.
arXiv Detail & Related papers (2026-01-19T07:55:29Z)
Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner. A mean-teacher model is then employed to correct labels of noisy samples. We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z)
Combating Label Noise With A General Surrogate Model For Sample Selection [77.45468386115306]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.<n>We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z)
Class Prototype-based Cleaner for Label Noise Learning [73.007001454085]
Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem. We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
arXiv Detail & Related papers (2022-12-21T04:56:41Z)
SLICER: Learning universal audio representations using low-resource self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data. Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z)
Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z)
Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network. Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced. We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
Multi-Objective Interpolation Training for Robustness to Label Noise [17.264550056296915]
We show that standard supervised contrastive learning degrades in the presence of label noise. We propose a novel label noise detection method that exploits the robust feature representations learned via contrastive learning. Experiments on synthetic and real-world noise benchmarks demonstrate that MOIT/MOIT+ achieves state-of-the-art results.
arXiv Detail & Related papers (2020-12-08T15:01:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.