Related papers: Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

URL: http://arxiv.org/abs/2010.11871v2
Date: Sun, 16 May 2021 13:40:26 GMT
Title: Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm
Authors: Hideyuki Tachibana
Abstract summary: In neural network-based monaural speech separation techniques, it has been recently common to evaluate the loss using the permutation invariant training (PIT) loss. This paper proposes a SinkPIT, a novel variant of the PIT losses, which is much more efficient than the ordinary PIT loss when $N$ is large.
Score: 9.340611077939828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In neural network-based monaural speech separation techniques, it has been recently common to evaluate the loss using the permutation invariant training (PIT) loss. However, the ordinary PIT requires to try all $N!$ permutations between $N$ ground truths and $N$ estimates. Since the factorial complexity explodes very rapidly as $N$ increases, a PIT-based training works only when the number of source signals is small, such as $N = 2$ or $3$. To overcome this limitation, this paper proposes a SinkPIT, a novel variant of the PIT losses, which is much more efficient than the ordinary PIT loss when $N$ is large. The SinkPIT is based on Sinkhorn's matrix balancing algorithm, which efficiently finds a doubly stochastic matrix which approximates the best permutation in a differentiable manner. The author conducted an experiment to train a neural network model to decompose a single-channel mixture into 10 sources using the SinkPIT, and obtained promising results.

Related papers

Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube. We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost. We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds) We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z)
Single-channel speech separation using Soft-minimum Permutation Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal. Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem. In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z)
Many-Speakers Single Channel Speech Separation with Optimal Permutation Training [91.22679787578438]
We present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C3)$ time complexity. Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.
arXiv Detail & Related papers (2021-04-18T20:56:12Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z)
A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly. A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z)
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization [8.014215296763435]
gradient learning methods (SGMs) have been extensively used for solving problems or large-scale machine learning problems. We propose a new SGM, PStorm, for solving nonsmooth problems.
arXiv Detail & Related papers (2020-05-31T03:18:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.