Multiple Choice Learning for Efficient Speech Separation with Many Speakers
- URL: http://arxiv.org/abs/2411.18497v1
- Date: Wed, 27 Nov 2024 16:38:34 GMT
- Title: Multiple Choice Learning for Efficient Speech Separation with Many Speakers
- Authors: David Perera, François Derrida, Théo Mariotte, Gaël Richard, Slim Essid,
- Abstract summary: Training speech separation models in the supervised setting raises a permutation problem.
We consider using the Multiple Choice Learning framework, which was originally introduced to tackle ambiguous tasks.
- Score: 14.259149632246555
- License:
- Abstract: Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally introduced to tackle ambiguous tasks. We demonstrate experimentally on the popular WSJ0-mix and LibriMix benchmarks that MCL matches the performances of PIT, while being computationally advantageous. This opens the door to a promising research direction, as MCL can be naturally extended to handle a variable number of speakers, or to tackle speech separation in the unsupervised setting.
Related papers
- UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Resilient Multiple Choice Learning: A learned scoring scheme with
application to audio scene analysis [8.896068269039452]
We introduce Resilient Multiple Choice Learning (rMCL) for conditional distribution estimation in regression settings.
rMCL is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses.
arXiv Detail & Related papers (2023-11-02T07:54:03Z) - Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations.
We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z) - KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive
Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP)
Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem.
Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - Many-Speakers Single Channel Speech Separation with Optimal Permutation
Training [91.22679787578438]
We present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C3)$ time complexity.
Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.
arXiv Detail & Related papers (2021-04-18T20:56:12Z) - Guided Training: A Simple Method for Single-channel Speaker Separation [40.34570426165019]
We propose a strategy to train a long short-term memory (LSTM) model to solve the permutation problem in speaker separation.
Due to the powerful capability on sequence modeling, LSTM can use its memory cells to track and separate target speech from interfering speech.
arXiv Detail & Related papers (2021-03-26T08:46:50Z) - On permutation invariant training for speech source separation [20.82852423999727]
We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models.
First, we look at the two-stage speaker separation and tracking algorithm based on frame level PIT (tPIT) and clustering, which was originally proposed for the STFT domain.
Second, we extend a recently proposed auxiliary speaker-ID loss with a deep feature loss based on "problem agnostic speech features", to reduce the local permutation errors made by the utterance level PIT (uPIT)
arXiv Detail & Related papers (2021-02-09T16:57:32Z) - Stabilizing Label Assignment for Speech Separation by Self-supervised
Pre-training [58.30339239234169]
We propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model.
Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.
arXiv Detail & Related papers (2020-10-29T06:07:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.