Sampling-Frequency-Independent Audio Source Separation Using Convolution
Layer Based on Impulse Invariant Method
- URL: http://arxiv.org/abs/2105.04079v1
- Date: Mon, 10 May 2021 02:33:42 GMT
- Title: Sampling-Frequency-Independent Audio Source Separation Using Convolution
Layer Based on Impulse Invariant Method
- Authors: Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi
Saruwatari
- Abstract summary: We propose a convolution layer capable of handling arbitrary sampling frequencies by a single deep neural network.
We show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.
- Score: 67.24600975813419
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Audio source separation is often used as preprocessing of various
applications, and one of its ultimate goals is to construct a single versatile
model capable of dealing with the varieties of audio signals. Since sampling
frequency, one of the audio signal varieties, is usually application specific,
the preceding audio source separation model should be able to deal with audio
signals of all sampling frequencies specified in the target applications.
However, conventional models based on deep neural networks (DNNs) are trained
only at the sampling frequency specified by the training data, and there are no
guarantees that they work with unseen sampling frequencies. In this paper, we
propose a convolution layer capable of handling arbitrary sampling frequencies
by a single DNN. Through music source separation experiments, we show that the
introduction of the proposed layer enables a conventional audio source
separation model to consistently work with even unseen sampling frequencies.
Related papers
- From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - One-Dimensional Deep Image Prior for Curve Fitting of S-Parameters from
Electromagnetic Solvers [57.441926088870325]
Deep Image Prior (DIP) is a technique that optimized the weights of a randomly-d convolutional neural network to fit a signal from noisy or under-determined measurements.
Relative to publicly available implementations of Vector Fitting (VF), our method shows superior performance on nearly all test examples.
arXiv Detail & Related papers (2023-06-06T20:28:37Z) - BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
Binaural Audio Synthesis [129.86743102915986]
We formulate the synthesis process from a different perspective by decomposing the audio into a common part.
We propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively.
Experiment results show that BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics.
arXiv Detail & Related papers (2022-05-30T02:09:26Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Zero-shot Audio Source Separation through Query-based Learning from
Weakly-labeled Data [26.058278155958668]
We propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet.
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
The proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training.
arXiv Detail & Related papers (2021-12-15T05:13:43Z) - SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs [0.0]
Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it.
SpecSinGAN takes a single one-shot sound effect and produces novel variations of it, as if they were different takes from the same recording session.
arXiv Detail & Related papers (2021-10-14T12:25:52Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Multi-stream Convolutional Neural Network with Frequency Selection for
Robust Speaker Verification [2.3437178262034095]
We propose a novel framework of multi-stream Convolutional Neural Network (CNN) for speaker verification tasks.
The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling.
We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline.
arXiv Detail & Related papers (2020-12-21T07:23:40Z) - Choosing a sampling frequency for ECG QRS detection using convolutional
networks [1.6822770693792823]
This research investigates the impact of six different sample frequencies on four different convolutional network-based models' generalisability and complexity.
Findings reveal that convolutional network-based deep learning models are capable of scoring higher levels of detection accuracies on ECG signals sampled at frequencies as low as 100Hz or 250Hz.
arXiv Detail & Related papers (2020-07-04T09:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.