Towards Listening to 10 People Simultaneously: An Efficient Permutation
Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm
- URL: http://arxiv.org/abs/2010.11871v2
- Date: Sun, 16 May 2021 13:40:26 GMT
- Title: Towards Listening to 10 People Simultaneously: An Efficient Permutation
Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm
- Authors: Hideyuki Tachibana
- Abstract summary: In neural network-based monaural speech separation techniques, it has been recently common to evaluate the loss using the permutation invariant training (PIT) loss.
This paper proposes a SinkPIT, a novel variant of the PIT losses, which is much more efficient than the ordinary PIT loss when $N$ is large.
- Score: 9.340611077939828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In neural network-based monaural speech separation techniques, it has been
recently common to evaluate the loss using the permutation invariant training
(PIT) loss. However, the ordinary PIT requires to try all $N!$ permutations
between $N$ ground truths and $N$ estimates. Since the factorial complexity
explodes very rapidly as $N$ increases, a PIT-based training works only when
the number of source signals is small, such as $N = 2$ or $3$. To overcome this
limitation, this paper proposes a SinkPIT, a novel variant of the PIT losses,
which is much more efficient than the ordinary PIT loss when $N$ is large. The
SinkPIT is based on Sinkhorn's matrix balancing algorithm, which efficiently
finds a doubly stochastic matrix which approximates the best permutation in a
differentiable manner. The author conducted an experiment to train a neural
network model to decompose a single-channel mixture into 10 sources using the
SinkPIT, and obtained promising results.
Related papers
- Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent [83.85536329832722]
We solve the $k$-sparse parity problem with sign gradient descent (SGD) on two-layer fully-connected neural networks.
We show that this approach can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube.
We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Training Overparametrized Neural Networks in Sublinear Time [14.918404733024332]
Deep learning comes at a tremendous computational and energy cost.
We present a new and a subset of binary neural networks, as a small subset of search trees, where each corresponds to a subset of search trees (Ds)
We believe this view would have further applications in analysis analysis of deep networks (Ds)
arXiv Detail & Related papers (2022-08-09T02:29:42Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - Many-Speakers Single Channel Speech Separation with Optimal Permutation
Training [91.22679787578438]
We present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C3)$ time complexity.
Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.
arXiv Detail & Related papers (2021-04-18T20:56:12Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z) - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z) - Momentum-based variance-reduced proximal stochastic gradient method for
composite nonconvex stochastic optimization [8.014215296763435]
gradient learning methods (SGMs) have been extensively used for solving problems or large-scale machine learning problems.
We propose a new SGM, PStorm, for solving nonsmooth problems.
arXiv Detail & Related papers (2020-05-31T03:18:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.