Exploring Federated Self-Supervised Learning for General Purpose Audio
Understanding
- URL: http://arxiv.org/abs/2402.02889v1
- Date: Mon, 5 Feb 2024 10:57:48 GMT
- Title: Exploring Federated Self-Supervised Learning for General Purpose Audio
Understanding
- Authors: Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen
- Abstract summary: We propose a novel Federated SSL (F-SSL) framework, dubbed FASSL, that enables learning intermediate feature representations from large-scale decentralized heterogeneous clients.
Our study has found that audio F-SSL approaches perform on par with the centralized audio-SSL approaches on the audio-retrieval task.
- Score: 14.468870364990291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of Federated Learning (FL) and Self-supervised Learning (SSL)
offers a unique and synergetic combination to exploit the audio data for
general-purpose audio understanding, without compromising user data privacy.
However, rare efforts have been made to investigate the SSL models in the FL
regime for general-purpose audio understanding, especially when the training
data is generated by large-scale heterogeneous audio sources. In this paper, we
evaluate the performance of feature-matching and predictive audio-SSL
techniques when integrated into large-scale FL settings simulated with
non-independently identically distributed (non-iid) data. We propose a novel
Federated SSL (F-SSL) framework, dubbed FASSL, that enables learning
intermediate feature representations from large-scale decentralized
heterogeneous clients, holding unlabelled audio data. Our study has found that
audio F-SSL approaches perform on par with the centralized audio-SSL approaches
on the audio-retrieval task. Extensive experiments demonstrate the
effectiveness and significance of FASSL as it assists in obtaining the optimal
global model for state-of-the-art FL aggregation methods.
Related papers
- Universal Sound Separation with Self-Supervised Audio Masked Autoencoder [35.560261097213846]
We propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system.
The proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model.
arXiv Detail & Related papers (2024-07-16T14:11:44Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z) - Deploying self-supervised learning in the wild for hybrid automatic
speech recognition [20.03807843795386]
Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR)
We show how to utilize untranscribed audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model.
arXiv Detail & Related papers (2022-05-17T19:37:40Z) - Combining Spectral and Self-Supervised Features for Low Resource Speech
Recognition and Translation [27.857955394020475]
Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks.
The quality of SSL representations depends highly on the relatedness between the SSL training domain(s) and the target data domain.
We propose a learnable and interpretable framework to combine SF and SSL representations.
arXiv Detail & Related papers (2022-04-05T20:09:15Z) - Audio Self-supervised Learning: A Survey [60.41768569891083]
Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations.
Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing.
arXiv Detail & Related papers (2022-03-02T15:58:29Z) - Semantics-driven Attentive Few-shot Learning over Clean and Noisy
Samples [0.0]
We aim to train meta-learner models that can leverage prior semantic knowledge about novel classes to guide the classifier synthesis process.
In particular, we propose semantically-conditioned feature attention and sample attention mechanisms that estimate the importance of representation dimensions and training instances.
arXiv Detail & Related papers (2022-01-09T16:16:23Z) - Sound and Visual Representation Learning with Multiple Pretraining Tasks [104.11800812671953]
Self-supervised tasks (SSL) reveal different features from the data.
This work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks.
Experiments on sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models.
arXiv Detail & Related papers (2022-01-04T09:09:38Z) - A Strong Baseline for Semi-Supervised Incremental Few-Shot Learning [54.617688468341704]
Few-shot learning aims to learn models that generalize to novel classes with limited training samples.
We propose a novel paradigm containing two parts: (1) a well-designed meta-training algorithm for mitigating ambiguity between base and novel classes caused by unreliable pseudo labels and (2) a model adaptation mechanism to learn discriminative features for novel classes while preserving base knowledge using few labeled and all the unlabeled data.
arXiv Detail & Related papers (2021-10-21T13:25:52Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.